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Representations  in  Comprehension 


ABSTRACT 

Comprehension  of  computer  programs  Involves  detecting  or  Inferring  different 
kinds  of  relations  between  program  parts.  Different  kinds  of  programming 
knowledge  facilitate  detection  and  representation  of  the  different  textual 
relations.  The  present  research  Investigates  the  role  of  programming 
knowledge  In  program  comprehension  and  the  nature  of  mental  representations  of 
programs;  specifically,  whether  procedural  (control  flow)  or  functional  (goal 
hierarchy)  relations  dominate  programmers'  mental  representations  of  programs 
In  the  first  study  eighty  professional  programmers  were  tested  on 
comprehension  and  recognition  of  short  computer  program  texts.  The  results 
suggest  that  procedural  rather  than  functional  units  form  the  basis  of  expert 
programmers'  mental  representations,  supporting  work  In  other  areas  of  text 
comprehension  showing  the  Importance  of  text  structure  knowledge  In 
understanding.  In  a  second  study  forty  professional  programmers  studied  and 
modified  programs  of  moderate  length.  Results  support  conclusions  from  the 
first  study  that  programs  are  first  understood  In  terms  of  their  procedural 
episodes.  However,  results  also  suggest  that  a  programmer's  task  goals  may 
Influence  the  relations  that  dominate  mental  representations  later  In 
comprehension. 

INTRODUCTION 

Computer  programming  is  a  complex  cognitive  task  composed  of  a  variety  of 
subtasks  and  involving  several  kinds  of  specialized  knowledge  (Pennington  A 
Grabowskl,  1986).  A  skilled  computer  programmer  must  understand  the  problem 
to  be  solved,  design  a  solution,  code  the  solution  Into  a  programming 
language,  test  the  program's  correctness,  and  be  able  to  comprehend  written 
programs.  These  different  aspects  of  programming  require  knowledge  of  the 
real  world  problem  domain,  such  as  statistics,  banking,  or  physics;  knowledge 
of  design  strategies  and  useful  design  components;  knowledge  of  programming 
language  syntax,  text  structure  rules,  and  programming  conventions;  knowledge 
of  computer  features  that  impact  program  Implementation;  and  knowledge  of  the 
user  of  the  program.  Central  questions  In  the  study  of  cognitive  skills  In 
general  and  of  programming  In  particular  concern  the  nature  of  expert 
knowledge  and  how  various  types  of  knowledge  Influence  skilled  performances 
(Blsanz  6  Voss,  1981;  Chi,  Claser,  6.  Rees,  1982;  Kleras,  1985;  Miller,  1985). 

The  present  research  focuses  on  the  subtask  of  computer  program 
comprehension,  an  important  part  of  computer  programming  skill  from  both 
practical  and  theoretical  perspectives.  It  Is  estimated  that  more  than  50%  of 
all  professional  programmer  time  Is  spent  on  “program  maintenance’  tasks  that 
Involve  modifications  and  updates  of  previously  written  programs.  Because  the 
programs  are  most  often  written  by  other  programmers,  comprehension  plays  a 
central  role  in  this  endeavor.  From  a  theoretical  perspective,  comprehension 
Involves  the  assignment  of  meaning  to  a  particular  program,  an  accomplishment 
that  requires  the  extensive  application  of  specialized  knowledge.  Thus  the 
study  of  program  comprehension  provides  an  effective  means  for  studying  the 
role  of  particular  kinds  of  knowledge  In  cognitive  skill  domains. 

The  general  approach  employed  In  the  present  research  Is  to  regard  a 
computer  program  as  a  text.  Because  programs  are  Instructions  to  a  computer, 
the  closest  analogs  among  natural  language  texts  are  Instructions  about  how  to 
perform  a  particular  task,  often  referred  to  as  procedural  Instructions. 
Procedural  instructions  and  programs  also  share  the  feature  that  the  text  can 
be  "executed*  to  accomplish  a  goal. 
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Comprehension  of  computer  programs  involves  detecting  or  inferring  different  kinds  of 
relations  between  program  parts.  Different  kinds  of  programming  knowledge  facilitate 
detection  and  representation  of  the  different  textual  relations.  The  present  research 
investigates  the  role  of  pyograrming  knowledge  in  program  comprehension  and  the  nature 
of  mental  representations  of  programs;  specifically  whether  procedural  (control  flow)  or 
functional  (goal  hierarchy)  relations  dominate  programmers'  mental  representations  of  , 
programs.  In  the  first  study  eighty  professional  programmers  were  tested  on 
comprehension  and  recognition  of  short  computer  program  texts.  The  results  suggest 
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that  procedural  rather  than  functional  units  form  the  basis  of  expert  progranrners' 
mental  representations,  supporting  work  in  other  areas  of  text  comprehension 
showing  the  importance  of  text  structure  knowledge  in  understanding.  In  a  second 
study  forty  professional  programmers  studied  and  modified  programs  of  moderate 
length.  Results  support  conclusions  from  the  first  study  that  programs  are  first 
understood  in  terms  of  their  procedural  episodes.  However,  results  also  suggest 
that  a  programmer's  task  goals  may  influence  the  relations  that  dominate  mental 
representations  later  in  comprehension. 
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One  Advantage  to  viewing  programs  as  texts  Is  that  theories  and  methods 
In  the  study  of  text  comprehension  are  relatively  well  developed  so  that 
widely  accepted  characterizations  of  text  comprehension  can  serve  as  a 
starting  point  for  thinking  about  the  comprehension  of  programs  According  »o 
the  dominant  view,  various  knowledge  structures  (often  referred  to  as  schemas 
or  frames)  relevant  to  the  text  are  activated  In  the  course  of  comprehension 
of  the  text  (e.g.,  Adams  &  Collins,  1979;  Rumelhnrt,  1980)  For  example,  if  a 
person  is  reading  a  story  about  a  trip  to  France,  knowledge  about  how  stories 
typically  proceed  (a  story  schema)  as  well  as  more  specific  concent  knowledge 
about  vacation  trips,  the  parts  of  France,  etc.,  would  be  activated.  Schemas 
chat  are  verified  (or  persist)  provide  the  perspective  from  which  the  text  is 
understood,  allow  the  reader  to  account  for  and  interpret  Information 
explicitly  mentioned  in  the  text,  and  enable  Inferences  to  be  made  about 
information  not  mentioned.  For  example,  the  reader  may  determine  after  awhile 
that  the  story  Involves  a  business  trip  rather  than  a  vacation  trip  so  that 
Information  Initially  Interpreted  in  the  context  of  a  vacation  may  be 
reinterpreted  In  terms  of  knowledge  about  business  trips.  This  process 
results  in  a  mental  representation  of  the  text  that  is  influenced  by 
Information  and  structure  In  the  stimulus  text  as  well  as  Information  and 
structure  provided  by  activated  knowledge.  The  memory  representation  of  the 
text  is  assumed  to  have  levels.  One  of  the  most  widely  cited  theories 
distinguishes  between  a  micros  true  ture  level  consisting  of  propositions  and 
their  interrelations  that  correspond  closely  to  the  text  and  a  macrostmctur" 
level  consisting  of  a  smaller  number  of  propositions  that  characterize  the 
text  at  a  more  abstract  level  (Kintsch  &  van  DIJk,  1978).  The  theory  implies 
that  a  key  process  in  text  comprehension  involves  chunking  the  text  Into 
segments  that  correspond  to  schema  categories  so  that  labels  for  segments  will 
constitute  the  macrostructure  for  the  text  (Kintsch,  1977).  In  other  words, 
the  structure  of  activated  knowledge  is  an  organizing  framework  for  the  mental 
representation  of  the  text  at  the  macrostructure  level.  Thus  mental 
representations  of  text  and  the  related  knowledge  structures  are  linked  In  the 
comprehension  process. 

The  purpose  of  the  present  research  is  to  explore  the  role  of  two  kinds 
of  programming  knowledge  --  text  structure  knowledge  (Baslli  &  Mills,  1982; 
Curtis,  Forman,  Brooks,  Soloway,  &  Ehrlich,  1984)  and  plan  knowledge  (Soloway 
&  Ehrlich,  1984)  ••  that  might  describe  macrostructures  in  the  construction  of 
mental  representations  of  program  texts.  These  kinds  of  knowledge  have 
analogs  in  other  text  comprehension  domains  (see  Handler,  1984  as  well  as 
Britton  &  Black,  1985  for  many  examples);  they  play  a  special  role  in 
understanding  procedural  instructions  and  programs  because  complete 
comprehension  of  programs  (and  other  texts)  requires  understanding  multiple 
relations  between  parts  of  the  text  that  are  difficult  to  view  simultaneously 
Thus,  the  nature  of  the  macrostructure  will  determine  which  aspects  of  the 
text  will  be  relatively  easier  or  more  difficult  to  understand. 

In  the  sections  that  follow,  our  analysis  of  program  comprehension  begins 
with  analyses  of  the  computer  program  stimulus  structures.  These  analyses  are 
abstractions  of  tlje  text  and  they  are  intended  to  illustrate  features  of  the 
text  (not  mental  entitles)  that  may  or  may  not  be  detected  during 
comprehension.  Ve  then  describe  two  kinds  of  programming  knowledge  structures 
that  are  Involved  in  computer  program  comprehension  and  propose  two 
alternative  hypotheses  concerning  the  kind  of  knowledge  that  plays  an 
organizing  role  In  the  mental  representation  of  the  text.  Because  there  are 
correspondences  between  certain  abstractions  of  the  text  and  particular  tvpes 
of  knowledge,  the  kind  of  knowledge  that  provides  organizing  structure  in  the 
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resulting  mental  representation  of  the  text  will  have  Implications  for  which 
text  features  are  explicitly  Included  In  the  mental  representation  These 
hypotheses  and  their  Implications  are  tested  In  two  empirical  studies 

MULTIPLE  ABSTRACTIONS  OF  COMPUTER  PROGRAM  TEXT 

For  computer  programs,  as  for  other  types  of  texts,  there  are  different 
kinds  of  Information  Implicit  "In  the  text"  that  must  be  detected  In  order  to 
fully  understand  the  program  (Green,  1980;  Green,  Slme,  &  Fitter,  1980; 
Pennington,  1982).  For  example  the  sequence  of  statements  In  the  program  and 
certain  keywords  provide  information  about  the  sequence  In  which  program 
statements  will  be  executed.  This  kind  of  information  Is  called  the  control 
flow  of  the  program  and  understanding  a  program  requires  understanding  its 
control  flow.  Another  kind  of  Information  contained  in  programs,  called  the 
data  flow  of  the  program,  concerns  the  changes  or  constancies  in  the  meaning 
or  value  associated  with  the  names  of  program  objects  throughout  the  course  of 
the  program. 

In  the  Illustration  that  follows,  a  sample  program  text  is  analyzed  in 
terms  of  four  different  kinds  of  information  implicit  in  the  text.  Each  of 
these  analyses  results  in  an  abstraction  of  the  text  that  highlights  one  set 
of  relations  between  program  parts  but  obscures  others.  The  analyses  are  not 
Intended  to  be  claims  about  mental  representations ,  rather  these  abstractions 
are  based  on  formal  analyses  of  programs  developed  by  computer  scientists. 
Analyses  of  natural  language  text  In  terms  of  underlying  causal,  referential, 
or  logical  relations  are  similar  abstractions  of  text  based  on  different  kinds 
of  information  in  the  text  that  are  relevant  to  Its  comprehension  (Klntsch, 
1974;  Meyer,  1975;  Trabasso,  Secco,  A  van  den  Broeck,  1982). 

The  program  text  to  be  analyzed  Is  written  in  COBOL,  a  programming 
language  noted  for  it  resemblance  to  English  (see  Figure  l.A).  This  program 
solves  a  toy  problem  In  which  a  list  of  clients  and  their  product  orders  for  a 
month  are  processed  and  average  order  sizes  for  two  subsets  of  clients  are 
computed  (see  Figure  l.B). 

*+*■  ■*+***-*•****+■*  •**+*■****■*■** 

Insert  Figure  1  about  here 
***'******•**■***********+*** 

The  first  abstraction  of  the  program  text  is  structured  In  terms  of  the 
goals  of  the  program,  that  is,  what  the  program  Is  supposed  to  accomplish  or 
produce  (see  Figure  2).  It  Is  labeled  a  goal  hierarchy  but  could  also  be 
described  as  a  decomposition  according  to  the  major  program  functions  or 
outputs  (cf.,  Adelson,  1984).  The  higher  level  decompositions  show  that  the 
program  will  produce  three  things;  two  averages  and  some  printed  output  At 
the  lower  levels,  subgoals  are  specified  for  each  higher  level  goal  For 
example,  computing  the  average  for  the  subset  of  "ordering"  clients  involves 
summing  over  orders,  counting  the  relevant  subset  of  clients  and  dividing 
Notice  that  In  this  abstraction  there  Is  little  explicit  Information  as  to  how 
these  goals  will  be  Accomplished.  For  example,  the  total  list  of  clients 
could  be  searched  once  to  count  up  the  active  clients  and  once  again  to  add  up 
the  order  quantities.  Alternatively,  a  single  pass  through  the  list  could 
classify  the  client  as  active  or  not  and  perform  the  appropriate  count  and  sum 
operations  when  an  active  client  is  encountered.  Of  course,  the 
implementation  details  are  in  the  text  but  are  lost  In  the  abstraction 
focusing  on  functional  relations  between  parts.  Some  inferences  about  the 
ordering  of  events  can  be  made  from  this  representation  on  the  basis  of 
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A.  COBOL  PROGRAM  SEGMENT 


STATEMENT  NUMBER 

1  MOVE  ZERO  TO  COUNT-CLIENTS. 

2  MOVE  ZERO  TO  TOTAL-ORDERS. 

3  MOVE  ZERO  TO  INACTIVE-CLIENTS. 

4  READ  ORDER-FILE  INTO  ORDER-REC. 

5  PERFORM  SUM-ORDERS  UNTIL  ORDER-REC- ID  -  999999. 

6  COMPUTE  ACTIVE-CLIENTS  -  COUNT-CLIENTS  -  INACTIVE-CUE 

7  COMPUTE  CLIENT-AVG  -  TOTAL-ORDERS/COUNT-CLIENTS. 

S  COMPUTE  ACTIVE-AVG  =■  TOTAL-ORDERS/ACTIVE-CLIENTS. 

9  DISPLAY  MSG-l .  ACTIVE-AVG  UPON  PRINTER. 

10  GO  TO  ORDER-EXIT. 

11  SUM-ORDERS. 

12  ADD  1  TO  COUNT-CLIENTS. 

13  ADD  ORDER-REC-QUANT  TO  TOTAL-ORDERS. 

14  IF  ORDER-REC-QUANT  =  ZERO  ADD  1  TO  INACTIVE-CLIENTS. 

15  READ  ORDER-FILE  INTO  ORDER-REC. 


B.  PROBLEM:  GIVEN  A  LIST  OF  CLIENTS  AND  THEIR  PRODUCT  ORDERS 
FOR  THIS  MONTH.  CALCULATE  THE  AVERAGE  QUANTITY 
ORDERED  FOR  ALL  CLIENTS  AND  THE  AVERAGE  FOR  CLIENTS 
WHO  ORDERED  DURING  THE  MONTH.  PRINT  OUT  THE 
AVERAGE  ORDER  SIZE  FOR  ORDERING  CLIENTS. 


-t  I  <1  *->  / 
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everyday  knowledge;  for  example,  Che  orders  must  be  summed  before  division  ran 
take  place . 

+  ****■***+*****•*■**++**+++** 

Insert  Figure  2  about  here 

********+***************** 

A  second  abstraction  of  the  program  text  Is  structured  In  terras  of 
program  processes  that  transform  the  Initial  data  objects  Into  the  outputs  of 
the  program  (see  Figure  3),  For  example,  Figure  3  shows  that  the  data  object 
"file  of  client  orders"  Is  used  by  the  process  "count"  to  calculate  the  number 
of  clients  without  orders  this  month.  Because  the  flow  of  each  data  object 
can  be  traced  through  the  series  of  transformations  In  which  It  par t lc Ipa te s , 
this  is  called  a  data  flow  abstraction.  This  abstraction  Is  closely  related 
to  the  goal  hierarchy  shown  In  Figure  2.  For  example  the  first  level 
decomposition  of  goals  In  the  goal  hierarchy  Is  to  compute  averages  and  print 
an  average.  These  correspond  to  the  final  data  objects  at  the  bottom  of  the 
data  flow  abstraction  shown  In  Figure  3,  which  are  a  printed  average  and  a 
computed  average.  The  goal  hierarchy  can  be  at  least  partly  recovered  from 
the  data  flow  abstraction  by  working  up  from  the  bottom  although  It  requires 
the  application  of  knowledge  to  Infer  the  grouping  of  subgoals  with  their 
goals.  However,  In  the  data  flow  abstraction,  everything  that  happens  to  a 
particular  data  object  Is  readily  available  In  a  way  that  Is  not  apparent  from 
the  goal  hierarchy.  In  addition,  the  data  flow  abstraction  allows  more 
Inferences  to  be  made  about  the  order  in  which  certain  operations  will  occur 
than  does  the  goal  hierarchy.  If  an  action  (marked  by  a  box,  e.g. ,  "compute" 
has  two  data  objects  as  Inputs  (marked  by  an  oval,  e.g.,  "sura  of  orders", 
"number  of  clients")  then  the  action  cannot  take  place  until  the  data  objects 
are  both  available;  thus  the  process  that  produces  a  data  object  (e.g.,  "sum 
orders")  must  execute  prior  to  the  process  that  consumes  it  ("compute 
average " ) . 

★  ★★★★★★★♦★★it  ★★★★★ 

Insert  Figure  3  about  here 

************************** 

A  third  abstraction  of  the  program  text,  called  a  control  flow 
representation  or  flowchart,  Is  structured  In  terms  of  the  sequence  In  which 
program  actions  will  occur  (see  Figure  ^) .  The  links  between  program  actions 
In  this  structure  represent  the  passage  of  execution  control  Instead  of  the 
passage  of  data  as  In  the  data  flow  abstraction.  This  form  highlights 
sequencing  information  but  conclusions  about  data  flow  must  be  Inferred  by 
looking  for  repeated  data  object  names.  For  example,  to  find  out  In  what 
events  the  "counter  for  clients"  participates  (easily  determined  In  the  data 
flow  abstraction.  Figure  3)  it  is  necessary  to  track  Its  use  In  the  sequence 
of  operations  In  the  control  flow  abstraction  (Figure  4) .  It  Is  also 
difficult  to  detect  goal/subgoal  relations  quickly.  For  example,  the  higher 
order  goal  of  commuting  an  average  over  ordering  clients  Is  specified  In  the 
last  procedural  block  of  the  control  flow  abstraction  but  the  subgoal 
operations  of  summing  and  counting  are  not  explicitly  linked  to  the  higher 
order  goal. 

************************** 

Insert  Figure  U  about  here 

************************** 

A  fourth  abstraction  Is  structured  In  terms  of  the  program  actions  that 
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GOAL  HIERARCHY:  THE  PROGRAM  ACCOMPLISHES  CERTAIN  GOALS  BY 
PRODUCING  OUTPUTS.  EACH  LEVEL  INDICATES  A 
HIGHER  ORDER  GOAL  IS  DECOMPOSED  INTO  SUBGOALS. 


CALCULATE  AVERAGE  ORDER,  AVERAGE  ACTIVE 
ORDER,  AND  PRINT  AVERAGE  ACTIVE  ORDER 


COMPUTE  AVERAGES 


PRINT  AVERAGE 
ORDER  FOR 
ORDERING  CL  IE 


I  I 

COMP'jTE  AVERAGE  COMPUTE  AVERAGE 

PGR  CROPPING  CLIENTS  FOR  ALL  CLIENTS 
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PROGRAM  ACTIONS  OCCUR  IN  A  SPECIAL  SEQUENCE • 


will  '  w  i  £  *'  t  s  a  pai'i^ular  s*»t  •  >  f  fondlt.  ions  i  s  true  ■;  spp  Figure  T  ■  • 

fltis'.Mi  'i'Mi  Is  !  ikp  a  Hprlslon  table  in  which  each  possible  srat.e  of  the  w:  . 
is  associated  with  its  ronspfj'ipnr  r-  s  ;  it  also  resembles  the  production  svs'em 
r mn.i i  r  i on  •  a<  '  i on  pairs  that  are  us.*d  to  represent  human  procedural  knowlpij^p 
(e  g  .  Anderson.  19ft3;  Newell  r»  Simon,  19  72  ).  In  this  abstraction,  the 
program  is  viewed  as  being  in  a  particular  state  at  each  moment  in  time.  The 
stare  triggers  an  action,  execution  of  the  action  results  in  a  new  state,  the 
th  w  state  triggers  another  act  it  and  so  on.  It  Is  therefore  easv  to  find 
lint  what  results  If  a  given  set  of  conditions  occurs,  and  also  relatively  ea  . 
fn  find  out  what  setts'  of  conditions  can  lead  to  a  given  action.  This  kind 
nf  stare  information  is  much  harder  to  deduce  from  the  other  abstractions 
However,  information  about  thp  sequence  in  which  actions  occur  and  informat i': 
about  higher  level  goals  are  ditficult  to  extract  In  the  condition allied 

a_r;  •;  i  --_o  abstraction 

Insert  Figure  5  about  here 

Th  i s  aralvsis  of  the  multiple  abstractions  that  characterize  a  computer 
program  'ex'  also  applies  to  English  language  Instructions,  such  as  training 
manuals.  rpoipes.  knifirg  instructions,  and  assembly  instructions.  In  these 
texts,  too.  Information  is  conveyed  about  what  should  be  accomplished  (goal 
hierarchy).  how  to  do  it  (sequential  procedure),  the  sets  of  conditions  under 
which  particular  actions  should  be  taken  (cor.di tional ized  action',  and  the  c«* 
of  transformations  'hat  a  particular  object  should  go  through  (data  flow'.  T  > 
be  concrete,  consider  a  hvpotherical  set  of  detailed  instructions  on  how  to 
cook  spaghetti  carbonara  An  abstraction  of  the  instructions  in  terms  of 
functions  tells  "what  is  to  be  accomplished."  For  example,  the  top  level  g'-a* 
might  be  specified  as  "ma*e  spaghetti  carbonara",  with  immediate  subgoals  of 
"make  spaghetti,  make  sauce,  mix  together."  A  procedural  (control  flout 
abstraction  specifies  the  order  of  execution  and  tells  "how  to  do  it."  F^r 
example,  "assemble  the  ingredients,  heat  the  water,  start  the  noodles,  gra'e 
the  cheese,  check  the  noodles"  might  constitute  a  partial  description  of  the 
sequence  of  steps  "-'hat  happens  to  particular  objects"  is  portraved  bv  a 
data  flow  abstraction  that  traces  the  series  of  transformations  applied  to 
each  object.  For  example  the  cheese  cores  out  of  the  refrigerator,  is  grate  • 
separated  in  half,  with  or.e-half  going  to  the  table  and  one-half  going  into  a- 
egg  mixture.  A  cond i r ion - ac t i on  abstraction  specifies  the  conditions  thaf 
should  trigger  certain  actions  such  as  when  it  is  time  to  heat  the  bowl 

Comprehension  requires  the  detection  and  represent  a  t  i  on  of  these  multi:’.*' 
relations  between  parts  of  the  text.  From  our  illustrations  it  is  clear  ‘ b a  * 
these  relations  are  difficult  to  express  simultaneously  The  Importance  of 
this  is  that,  parr  of  the  difficulty  of  writing  clear  Instructions, 
understanding  instructions,  or  under  standing  programs  Is  due  to  the  traderfi- 
that  inevitably  occur  in  hew  much  of  each  kind  of  information  can  bp 
highlighted  simultaneously.  Uncertainty  about  the  best  wav  to  write 
instructions  or  programs  may  be  largely  due  to  uncertainty  about  which 
structure  should  serve  as  the  organizing  principle  for  the  instructions. 

It  Is  possible  that  one  of  the  alternate. text  abstractions  corresponds 
more  closely  than  do  the  others  to  the  structure  of  the  programmer's  mental 
representation  of  the  program,  due  to  features  of  the  cognitive  processing 
system  and  the  organization  of  knowledge  used  in  the  comprehension  process 
For  example.  Adelson  suggests  that  a  mental  representation  in  terms  ot. 

program  r/’als.  reflecting  text  relations  specified  in  Figure  2.  characterizes 
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CONDITIONALIZED  ACTION:  A  SET  OF  CONDITIONS  RESULTS  IN  THE 

EXECUTION  OF  SOME  ACTION(S).  THE 
EXECUTION  OF  AN  ACTION  RESULTS  IN  A  NEW 
SET  OF  CONDITIONS- 
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the  natural  cognitive  representation  of  experienced  programmers  while  a 
procedural  representation,  reflecting  text  relations  specified  In  Figure  4,  Is 
most  natural  for  novice  programmers.  It  Is  also  possible  that  the  actual 
mental  representation  used  by  the  programmer  will  reflect  task  and  programming 
language  Influences  In  addition  to  the  influences  of  cognitive  capacities  and 
knowledge  structures,  For  example,  different  programming  languages  highlight 
different  relations  outlined  In  Figures  2*5  (Green,  1980;  Green,  et  al., 

1980).  We  assume  that  one  mediating  factor  In  correspondences  between  text 
structure  and  the  structure  of  mental  representations  Is  the  structure  of 
programming  knowledge  activated  during  comprehension.  We  now  examine  how  the 
structure  of  human  programming  knowledge  Is  related  to  these  text  abstractions 
and  to  potential  forms  of  mental  representations  of  programs, 

PROGRAMMING  KNOWLEDGE 

Various  types  of  knowledge  about  programming  will  enable  the  programmer 
to  detect  and  mentally  represent  the  variety  of  relations  that  are  Implicit  In 
the  text.  Some  kinds  of  knowledge  more  Important  than  others  In  constructing 
the  macrostructure  of  the  mental  representation.  In  the  present  research  we 
explore  the  role  of  two  types  of  programming  knowledge  In  program 
comprehension:  knowledge  of  text  structure  and  knowledge  of  program  plans. 
These  two  kinds  of  knowledge  do  not  exhaust  the  potential  range  of  programming 
knowledge  but  they  provide  a  useful  starting  point  and  they  have  been  most 
frequently  promoted  as  the  cognitively  natural  bases  for  program  design. 

Text  Structure  Knowledge 

Program  text  can  be  described  in  terms  of  a  limited  number  of  control 
flow  constructs.  Although  there  are  differences  about  the  exact  number  and 
description  of  these  control  flow  constructs,  three  basic  building  blocks  are 
typically  included:  sequence  In  which  control  passes  from  one  action  to  the 
next;  Iteration  In  which  an  action  Is  repeated  until  a  specified  condition 
exists  (commonly  referred  to  as  looping) ;  and  conditional  In  which  control 
passes  to  different  actions  depending  on  which  of  two  or  more  conditions  Is 
met  (sometimes  referred  to  as  if - then-e lse ) . 1  These  units  could  be  called 
structured  programming  units  because  of  an  emphasis  on  disciplined  control 
structuring  according  to  these  constructs  by  early  structured  programming 
advocates  (e.g.  Dahl,  Dijkstra,  &  Hoare,  1972).  These  fundamental  units  have 
also  been  called  prime  programs  referring  to  the  Idea  that  a  program  text  can 
be  decomposed  Into  sequence,  Iteration,  and  conditional  units  In  the  wav  that 
a  number  can  be  decomposed  Into  prime  number  factors.  Prime  programs  at  the 
lowest  level  of  decomposition,  represented  as  a  single  node,  can  be  aggregated 
Into  higher  level  sequence,  Iteration,  and  conditional  units  (Linger,  Mills,  f. 
Witt,  1979;  Baslll  &  Mills,  1982)  so  that  the  entire  program  text  can  be 
represented  as  a  hierarchy  of  prime  units.  An  analysis  of  the  sample  program 
text  (Figure  l.A)  In  terras  of  these  prime  program  text  structure  (TS)  units  is 
shown  In  Figure  6. A  (cf.,  Curtis,  et  al.,  1984).  For  example  statements 
numbered  1  through  4  in  Figure  l.A  forra  a  sequence  unit  as  shown  In  Figure 
6. A;  statements  5,  11,  and  the  embedded  sequence  12  through  15  form  an 
Iteration  unit  (loop);  and  statements  numbered  6  through  10  forra  another 
sequence  unit.  A  concatenation  of  the  sequence,  loop  and  sequence  units 
yields  a  higher  level  sequence  unit  that  is  the  entire  program  text  Thus  thp 
text  is  structured  as  a  hierarchy  of  prime  program  units. 
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*******+++*+*+***+*++*+*++ 

Insert  Figure  6  about  here 

A************************* 

The  decomposition  of  a  program  Into  control  primes  and  the  diagramming  of 
control  flow  according  to  structured  programming  units  are  analytic  techniques 
fhac  can  be  applied  to  programs.  We  refer  to  programmers'  knowledge  about 
these  structured  programming  units  as  text  structure  knowledge  (TS  knowledge). 
Professional  programmers  could  not  easily  escape  exposure  to  this  knowledge  In 
the  course  of  their  programming  education.  One  role  that  text  structure 
knowledge  could  play  In  comprehension  Is  that  of  organizing  the  memory 
representation  macrostructure  and  some  researchers  have  claimed  that  knowledge 
of  these  structural  components  plays  a  central  psychological  role  In  program 
comprehension.  For  example,  advocates  of  structured  programming  have 
hypothesized  that  programs  organized  according  to  a  strict  control  construct 
discipline  are  easier  to  understand  and  modify  because  they  correspond  to  the 
programmer's  mental  organization  (Dahl,  et  al.,  1972);  that  the  process  of 
comprehending  undocumented  programs  Is  similar  to  decomposing  a  program  into 
prime  programs  (Linger,  et  al.,  1979). 

One  presentation  of  such  a  comprehension  scheme  proposes  that  the  mental 
representation  of  a  program  has  a  macrostructure  organized  by  control  primes, 
that  is,  the  sequence.  Iteration,  and  conditional  text  structure  units  (Atwood 
A  Ramsey,  1978;  Curtis,  et  al.,  1984;  Baslli  &  Mills,  1982).  In  this  view, 
program  comprehension  proceeds  by  identifying  sequence,  Iteration,  and 
conditional  units  In  the  surface  structure  of  the  program  and  deriving  their 
local  purposes.  These  units  then  act  as  Items  that  combine  Into  higher  order 
sequence,  Iteration,  and  conditional  units,  with  higher  level  functions 
attached  to  units  at  this  level.  This  process  continues  until  the  highest 
level  Is  a  single  unit  with  an  identifiable  function.  For  example,  the  text 
structure  (TS)  decomposition  of  the  sample  program  segment  shown  in  Figure  6. A 
shows  "initialization  sequence",  "read  loop",  and  "computation  sequence"  as 
the  first  level  macros tructure  control  primes  and  higher  levels  of 
macrostructure  are  created  by  their  combination  (see  also  Atwood  A  Jeffries, 
1980;  Davis.  1984;  Mayer,  1977;  Shneiderman,  1980).  Thus,  the  text  structure 
analysis  represents  a  hypothesis  about  relations  between  program  parts  that 
organize  the  semantic  representation  of  text  in  memory. 

The  decomposition  of  program  text  into  text  structure  (TS)  units  (Figure 
6. A)  is  most  closely  related  to  our  earlier  analysis  of  program  text  in  terms 
of  control  flow  relations  (Figure  4).  However,  such  a  decomposition  will  not 
necessarily  correspond  to  the  surface  ordering  of  events  in  the  program  text 
as  shown  by  the  sequence  of  program  text  statement  numbers  in  Figure  6. A. 

There  is  some  empirical  support  for  the  idea  that  program  text  structure 
knowledge  plays  an  important  role  in  comprehension  and  possibly  an  organizing 
role  In  memory.  One  line  of  support  is  provided  by  evidence  that  programs 
have  a  psychological  "phrase  structure"  in  which  the  phrases  are  syntactically 
marked  by  keywords  of  the  programming  language:  WHILE. . .DO  is  a  marker  that 
Initiates  loops f  BEGIN  goes  with  END  to  mark  a  sequence,  and  so  forth 
(McKeithen,  Reitman,  Reuter,  &  Hirtle,  1981;  Norcio  &  Kerst,  1983).  In  one 
study  testing  free  recall  memory  for  program  texts  (McKeithen,  et  al .  ,  1981), 
statements  recalled  most  frequently  by  experts  but  not  by  novices  corresponded 
to  statements  marking  the  phrase  structure.  In  another  study  (Norcio  A  Kerst. 
lq83)  higher  proportions  of  correct  to  incorrect  recall  transitions  (and  vice 
versa)  occurred  at  the  boundaries  of  the  hypothesized  text  structure  units, 
analogous  to  findings  in  sentence  comprehension  research  that  recall  errors 

7 


are  greater  over  phrase  structure  boundary  transitions  (Fodor,  Bever,  (, 
Garrett,  197A;  Mitchell  &  Green,  1978;  Tejirian,  1988). 

Additional  support  Is  provided  In  a  study  of  program  comprehension  In 
which  programmers  studied  short  programs  composed  of  meaningful  code, 
structured  but  meaningless  code,  or  randomly  arranged  lines  of  code  (Schmidt, 
1983).  More  lines  of  both  meaningful  and  structured  but  meaningless  code  were 
recalled  compared  to  randomly  arranged  code.  Further,  longer  study  times 
occurred  at  control  construct  borders,  In  the  same  way  that  reading  times  are 
elevated  at  episode  boundaries  in  story  comprehension  (Haberlandt,  1980; 
Handler  &  Goodman,  1982). 

Additional  indirect  support  was  obtained  in  a  study  (Adelson,  1981)  of 
subjective  organization  in  programmers'  (multitrial)  free  recall  of  randomly 
presented  lines  of  program  code.  The  randomly  presented  lines  contained 
statements  that  could  be  viewed  as  three  routines  (5  lines  each)  or  as  five 
syntactic  groupings  (3  lines  each).  Experts  used  program  membership  as  an 
organizing  principle  and  routines  were  grouped  at  a  second  level  by  procedural 
similarity,  and  not  by  the  function  of  the  routine. 

In  summary,  the  Idea  that  text  structure  units  play  an  organizing  role  in 
memory  suggests  three  main  features  of  program  comprehension:  (1) 
Comprehension  proceeds  by  segmenting  statements  at  the  detail  level  into 
phrase-llke  groupings  that  then  combine  into  higher  order  groupings.  (2) 
Syntactic  markings  provide  surface  clues  to  the  boundaries  of  these  segments 
(3)  The  segmentation  reflects  the  control  structure  of  the  program.  Thus  In 
terms  of  the  multiple  abstractions  of  programs  (Figures  2-5),  sequence 
Information  should  be  readily  available;  data  flow  connections  that  occur 
across  unit  boundaries  should  be  relatively  more  difficult  to  Infer;  and 
function  information  should  be  least  accessible  since  it  Is  most  closely 
related  to  data  flow  and  requires  coordination  across  units. 

Plan  Knowledge 

A  second  kind  of  knowledge,  called  program  plan  knowledge  (PK  knowledge), 
emphasizes  programmers'  understanding  that  patterns  of  program  Instructions 
"go  together"  to  accomplish  certain  functions  (Rich,  1981;  Soloway,  Ehrlich  & 
Black,  1983;  Soloway,  Ehrlich,  &  Black,  1983;  Soloway,  Ehrlich  &  Bonar,  1982) 
Plans  correspond  to  a  vocabulary  of  intermediate  level  programming  concepts 
such  as  searching,  summing,  hashing,  counting,  etc.,  and  there  are  hundreds 
(maybe  thousands)  of  these  plans.  Like  other  forms  of  engineering  and  design, 
"there  Is  a  craft  discipline  among  programmers  consisting  of  a  repertoire  of 
standard  methods  of  achieving  certain  types  of  goals"  (Rich,  1980). 

A  plan  is  a  structure  with  roles  for  data  objects,  operations,  tests,  or 
other  plans,  and  with  constraints  on  what  can  fill  the  roles  In  a  given 
Instantiation  as  well  as  specifications  as  to  data  flow  and  control  flow 
connecting  segments  within  plans.  Plans  accomplish  things  and  are 
hierarchically  linked  on  the  basis  of  function  and  role  relations;  one  plan 
may  be  used  to  accomplish  the  goals  of  a  higher  order  plan  (Soloway  et  al., 

1 Q  8  3 )  .  For  example  a  very  simple  plan  Is  a  counter  plan  that  consists  of  an 
Initialization  part  plus  an  update -by-one  part.  A  plan  to  compute  an  average 
will  include  a  counter  plan  as  one  of  Its  parts.  Higher  level  plans  Include 
such  things  as  a  f ind- f Irst -value  -  search  plan,  a  merge  -  two - f 1 les  plan,  or  a 
bubble-sort  plan. 

The  specification  of  plan  knowledge  in  programming  has  been  elaborated  bv 
Rich,  Shrobe ,  and  Waters  (Rich,  1980,  1981;  Rich  &  Shrobe,  1979;  Rich  Si 
Waters,  1981;  Shrobe,  1979;  Waters,  1979),  who  have  developed  a  large  set  of 
plans  based  on  their  intuitions  about  programming,  and  by  Soloway  and  Ehrlich 
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(Soloway  ft  Ehrlich,  198''* ;  Soloway,  et  al.,  1983;  Solowav,  et  al  ,  1982  )  who 
have  developed  a  more  psychologically  motivated  theory  of  plan  knowledge  A 
plan  knowledge  (PK)  decomposition  of  the  COBOL  program  segment  (Figure  1  A)  In 
terms  of  underlying  program  plans  Is  shown  In  Figure  6.B  (this  particular 
decomposition  Is  based  on  work  by  the  MIT  Programmer's  Apprentice  project; 

Rich  &  Shrobe ,  1979;  see  Soloway,  et  al.,  1983  for  a  similar  analysis).  As 
before,  numbers  In  Figure  6.B  refer  to  program  statement  numbers  specified  In 
Figure  l.A  and  the  hierarchical  structure  of  the  diagram  In  Figure  A  B  shows 
that  plans  combine  together  to  form  higher  order  plans.  For  example, 
statements  2  and  13  Implement  a  counter  plan;  the  segment  as  a  whole  consists 
of  plans  for  reading  through  the  Inputs,  counting,  summing,  and  computing  an 
average . 

Plan  knowledge  units  could  also  form  the  comprehension  macrostructure, 
inplylng  that  understanding  a  program  Is  finding  a  set  of  underlying  plans 
such  that  parts  of  the  program  match  the  roles  In  the  hypothesized  plans 
Comprehension  of  a  program,  under  this  view,  would  proceed  by  partial  pattern 
matches  activating  candidate  plans,  causing  programmers  to  search  for  further 
evidence  to  Instantiate  a  plan.  According  to  this  concept  of  comprehension 
the  program  is  mentally  represented  as  a  set  of  linked  descriptions,  like 
blueprints,  rather  than  as  a  set  of  Instructions  to  be  executed.  Thus,  the 
plan  knowledge  analysis  also  represents  a  psychological  hypothesis  about 
relations  between  program  parts  that  might  organize  the  semantic 
representation  of  text  in  memory  as  depicted  In  Figure  6.B. 

Plan  representations  of  a  program  are  primarily  based  on  data  flow 
relations.  This  Is  because  much  of  the  control  structure  In  a  program  that  is 
not  mandated  by  data  flow  requirements  Is  arbitrary.  Thus  the  plan  knowledge 
(PK)  analysis  (Figure  6.B)  Is  most  closely  related  to  our  earlier  analvsis  of 
program  text  In  terms  of  data  flow  relations  (Figure  3)  and  function  (Figure 
2).  Such  a  decomposition  also  does  not  necessarily  correspond  to  the  text 
surface  structure  as  shown  by  program  text  statement  numbers  In  Figure  6  B 

There  is  also  empirical  evidence  concerning  the  importance  of  plan 
knowledge  In  program  comprehension.  PK  representations  have  been  invoked  to 
explain  how  expert  programmers  chunk  program  text  In  recall  tasks  (Creeno  & 
Simon.  1989 )  by  arguing  that  plan  knowledge  Is  used  to  code  the  functions  of 
the  presented  program.  Details  of  the  program  need  not  be  encoded  because  the 
programmer  has  only  to  expand  a  plan  Into  one  of  Its  implementations  to 
reconstruct  the  detail.  This  claim  corresponds  to  claims  made  In  research  on 
natural  language  processing  that  encoding  efficiency  is  achieved  by  activating 
scripts  (Schank  &  Abelson,  1977)  or  other  kinds  of  content  schemas  such  as 
goal/plan  knowledge  about  human  actions  (Schank  &  Abelson,  1977;  Vllenskv, 
1983) . 

There  Is  evidence  that  data  flow  relations  are  Important  In  algorithm 
design  (Kant  &  Newell,  1984)  and  program  modification  (Velser,  1982),  and 
evidence  that  experienced  programmers  are  better  than  novices  at  inferring 
program  function  In  a  comprehension  task  (Adelson,  1984).  These  studies  do 
not  address  questions  about  how  function  and  data  flow  relations  are  Inferred 
(l.e.,  by  recognizing  plans  or  in  some  other  way)  but  thev  do  Indicate  thp 
central  role  of  program  function  In  experts’  understanding. 

Evidence  more  directly  related  to  plans  as  critical  elements  In  program 
comprehension  Is  provided  by  Solowav  and  hls  colleagues  (Soloway  f.  Ehrlich. 
1984;  Solowav,  et  al.,  1982)  using  a  cloze  procedure  to  show  that  programmers 
will  fill  In  a  missing  line  of  a  program  with  a  predicted  plan  element;  that 
programmers  have  more  difficulty  comprehending  a  program  In  which  the  plan 
structure  has  been  disrupted;  and  that  experts  but  not  novices  can  resolve 
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dilemmas  when  conflicting  cues  about  which  plan  to  instantiate  are  provide.) 

In  addition,  Brooks  (1975)  has  simulated  program  composition  bv  specifying  a 
large  set  of  program  plans  and  processes  that  operate  on  them, 

[n  summary,  the  idea  that  plan  knowledge  plays  an  organizing  role  in 
memory  suggests  the  following  features  of  program  comprehension  (1) 
Comprehension  proceeds  by  the  recognition  of  patterns  that  implement  known 
programming  plans,  (2)  Plans  are  activated  by  parLial  pattern  matches  and 
confirming  details  are  either  sought  or  assumed  (3)  The  resulting 
segmentation  reflects  the  data  flow  structure  of  the  program  Indexed  by 
program  function.  Thus  in  terms  of  che  multiple  abstractions  of  programs 
(Figures  2-5),  data  flow  and  function  information  should  be  readily  available 
sequence  and  detail  operations  should  be  less  accessible. 

RESEARCH  OVERVIEW 

A  summary  of  the  correspondences  we  have  proposed  between  textual 
relations  (abstractions  of  program  text),  knowledge  structures,  and 
hypothesized  mental  representations  is  shown  In  Table  1.  Features  of  the  text 
activate  different  kinds  of  knowledge,  some  of  which  will  provide  an 
organizing  structure  for  the  mental  representat it n  of  the  text.  The  rows  of 
Table  1  represent  alternative  hypotheses  concerning  the  dominant  form  of  the 
mental  representation  of  programs.  The  structures  illustrated  in  Figures  6. A 
and  6.8  show  in  detail  the  potential  alternative  meaning  structures  In  memory 
corresponding  to  the  TS  and  PK  analyses  of  one  text  segment  and  we  have 
outlined  a  view  of  comprehension  that  might  lead  to  each. 

Insert  Table  1  about  here 

There  are  several  reasons  to  be  Interested  in  which  of  these  views  better 
characterizes  computer  program  comprehension.  First,  well  known  empirical 
results  across  a  wide  range  of  problem  solving  domains,  such  as  chess  (eg  , 
Chase  A  Simon,  1973a,  1973b),  GO  (Reitman,  1976),  bridge  (Engle  &  Bukstel . 
1978),  music  composition  (Halpern  &  Bower,  1982;  Sloboda,  1976)  and  computer 
programming  (McKelthen,  et  al.,  1981;  Shneiderman,  1976),  show  that  experts 
quickly  identify  meaningful  patterns  in  a  problem  array  that  are  stored  in 
memory  as  chunks  of  information.  These  results  suggest  that  for  experts  an 
abstract  representation  of  a  problem  array  is  available  quite  quickly  upon 
inspection.  Much  less  is  known  about  exactly  which  principles  underlie 
experts'  superior  organization  of  problem  information.  The  nature  of  mental 
representations  of  programs  and  the  units  that  underlie  their  organization 
(eg.  Adelson,  1989;  Curtis,  et  al . ,  1989;  Davis,  1989)  are  important  for 
resolving  arguments  over  how  programs  ought  to  be  structured,  understanding 
the  psychological  complexity  of  programs,  and  extending  Insight  into  skilled 
performance  to  an  Important  complex  task.  Second,  the  two  modes  of 
comprehension  have  different  consequences  in  terms  of  the  kinds  of  lnforraarlct 
that  are  relatively  easy  or  difficult  to  abstract  from  program  text  (Green, 
lq80) .  This  in  turn  is  important  in  determining  standards  for  computer 
programming  practices,  tools,  languages,  and  education. 

More  broadlv,  these  two  views  of  program  comprehension  mirror  debates  in 
other  areas  of  text  comprehension  and  composition  concerning  the  wavs  in  whit! 
different  kinds  of  knowledge  contribute  to  text  understanding.  One  kind  of 
knowledge  that  has  been  proposed  to  influence  comprehension  is  abstract 
knowledge  of  text  structure.  For  example,  content  free  abstract  knowledge 
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about  the  type  and  form  of  components  usually  Included  In  a  s '  r .  'sertny. 
episodes  consisting  of  Identifiable  parts  such  as  Initiating  events,  goals, 
attempts,  and  consequences)  is  often  hypothesized  to  provide  organizing  memo re¬ 
structures  In  story  comprehension  (e.g.,  Handler,  1984;  Rumelhart.  19?')).  A 
second  kind  of  knowledge  that  has  been  proposed  to  influence  comprehension  Is 
schematic  content  knowledge.  For  example,  work  on  "scripts,  goals,  and  plans" 
provides  evidence  that  content  specific  knowledge  about  typical  human  action 
sequences  In  specific  contexts  and  knowledge  of  typical  plans  that  achieve 
certain  goals  provide  organizing  memory  structures  In  story  comprehension 
(Black  &  Bower,  1980;  Bruce,  1980;  Schank  4  Abelson,  1977;  Thorndvke 
Yekovlch,  1980).  Arguments  about  the  priority  of  one  or  the  other  type  of 
knowledge  In  story  comprehension  are  difficult  to  resolve  since  both  stnrv 
texts  and  the  plans  Involved  In  human  action  sequences  tend  to  have  the  same 
or  similar  structures  (Black  &  Wllensky,  1979;  van  Dljk  A  Klntsch,  1983)  In 
programming  using  more  traditional  languages,  text  structure  knowledge 
corresponds  to  structured  programming  or  prime  program  units;  the  units  are 
few  in  number  and  abstract,  a  kind  of  "episode"  for  programs  Plan  knowledge 
corresponds  to  schematic  content  knowledge  and  there  are  potentially  thousand- 
of  such  patterns. 

The  empirical  evidence  cited  In  the  previous  section  concerning  each  vie* 
of  computer  program  comprehension.  In  terms  of  text  structure  units  (TS)  or 
plan  knowledge  units  (PK)  is  not  definitive  with  respect  to  the  role  of  the 
two  kinds  of  knowledge  In  forming  memory  macrostructures.  For  example 
superior  recall  of  program  statements  Introducing  loops  could  reflect  the 
priority  of  Iteration  control  flow  units  or  attention  to  key  statements  that 
activate  plan  knowledge  (McKeithen,  et  al . ,  1981).  Similarly,  evidence  that 
experienced  programmers  have  tacit  knowledge  regarding  awkward  program 
constructions  does  not  necessarily  Imply  that  this  knowledge  leads  to  plan 
based  mental  representations  (Soloway,  et  al.,  1983).  The  research  reported 
in  the  next  sections  was  designed  to  operationally  identify  the  form  of  mental 
representations  of  program  texts,  providing  Information  about  the  kinds  of 
relational  Information  In  programs  that  are  most  accessible  and  about  the 
roles  of  text  structure  knowledge  and  plan  knowledge  in  program  comprehension 

In  the  first  study  programmers  studied  very  short  program  texts  and 
responded  to  comprehension  and  memory  questions.  Short  texts  were  used  to 
obtain  a  high  degree  of  experimental  control.  Although  programming  studies 
have  typically  used  texts  of  this  length.  It  Is  desirable  to  examine 
experimental  results  In  more  realistic  settings.  In  the  second  study 
programmers  engaged  In  a  more  natural  task  In  which  they  studied  a  program  of 
moderate  length,  made  a  modification  to  It,  and  responded  to  comprehension 
questions  Thus  the  first  study  provides  relatively  direct  Information 
concerning  the  form  of  mental  representations  of  program  text.  In  the  second 
study,  comprehension  data  provide  Indirect  evidence  concerning  the  same 
questions  for  a  different,  more  natural  task. 

STUDY  ONE 

One  effective  technique  for  empirically  investigating  structures  In 
memory  Is  to  Index  the  relative  distance  between  elements  In  a  hypothesized 
structure  by  measuring  priming  effects  In  Item  recognition  (McKoon  &  Ratcliff. 
1980;  1984;  Ratcliff  {■  McKoon,  1978).  In  this  method,  subjects  study  one  or 
more  texts  and  are  subsequently  presented  with  a  recognition  test  In  which 

they  must  decide  whether  or  not  each  Item  In  the  list  was  In  the  text  they  had 

Just  studied.  A  target  Item  In  the  test  list  Is  preceded  in  one  condition  by 
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another  Item  hypothesized  to  be  in  the  same  cognitive  unit  as  tne  target  .u  . 
thus  close  in  the  memory  structure.  In  a  second  condition  the  target  item  is 
preceded  bv  an  item  hypothesized  to  be  in  a  different  cognitive  unit  and  thus 
farther  away  in  the  memory  structure.  Under  the  assumption  that  activation  of 
an  item  In  the  memory  structure  activates  items  close  to  it,  especially  those 
in  the  same  cognitive  unit,  response  time  to  the  target  preceded  by  an  item  lr. 
the  same  cognitive  unit  should  be  faster  than  response  time  to  the  same  target 
preceded  by  an  Item  not  In  the  same  cognltfve  unit  (Anderson,  1983;  McKoon  A 
Ratcliff.  1980);  that  is,  a  priming  effect  should  occur. 

In  the  first  experiment  the  priming  technique  was  used  to  examine 
distances  between  program  statements  In  program  texts  like  the  one  shown  In 
Figure  l.A.  If  the  representations  In  memory  of  the  meanings  of  the  program 
segment  correspond  to  the  structures  built  by  the  TS  (text  structure)  or  PK 
(■plan  knowledge)  decompositions,  then  the  relative  amounts  of  priming  between 
the  Items  should  be  predicted  by  the  relative  distances  between  the  concepts 
In  the  diagrammed  structures.  For  example,  In  Figure  6. A,  the  TS  structure, 
statement  2  (MOVE  0  TO  TOTAL-ORDERS.)  should  prime  statement  4  (READ  ORDER- 
FILE  INTO  ORDER-REC.)  because  they  are  In  the  same  TS  cognitive  unit.  However 
statement  2  (MOVE  0  TO  TOTAL- ORDERS . )  should  not  prime  statement  13  (ADD 
ORDF.R - REC -QUANT  TO  TOTAL-ORDERS.)  as  much  because  they  are  not  in  the  same  TS 
cognitive  unit.  The  PK  structure  (Figure  6.B)  makes  the  opposite  prediction: 
statement  2  (MOVE  0  TO  TOTAL- ORDERS . )  should  prime  statement  13  (ADD  ORDER  - 
REC-QUANT  TO  TOTAL-ORDERS.)  because  they  are  in  the  same  PK  cognitive  unit  and 
statement  2  (MOVE  0  TO  TOTAL-ORDERS.)  should  not  prime  statement  4  (READ 
ORDER -FILE  INTO  ORDER-REC.)  as  much  because  they  are  not  in  the  same  PK 
cognitive  unit.  Other  co-varylng  features  such  as  argument  repetition  and 
surface  distance  will  need  to  be  controlled  by  balancing  these  attributes 
across  the  set  of  items  used  (McKoon  &  Ratcliff,  1984). 

It  is  possible  that  neither  of  the  theoretical  decompositions  shown  In 
Figure  6  precisely  describes  the  programmer's  decomposition.  In  this  case 
priming  effects  will  not  be  obtained.  However,  failure  to  find  priming 
effects  is  not  informative  as  to  what  is  wrong  with  the  theoretical  proposals 
and  additional  measures  of  program  comprehension  are  needed.  One  additional 
measure  is  to  ask  programmers  questions  about  their  understanding  of  the 
program  text  In  order  to  ascertain  what  aspects  of  meaning  can  be  attained  In 
limited  time  and  to  provide  an  assessment  of  learning  relevant  to 
interpretation  of  recognition  memory  test  results.  Earlier  we  suggested  that 
there  are  at  least  four  kinds  of  relations  between  program  statements  that 
contribute  to  a  complete  understanding  of  the  program;  major  functional 
relation  specifying  the  goal  structure  of  the  program  (Figure  2);  data  flow 
relations  specifying  the  sets  of  events  in  the  program  In  which  particular 
variables  participate  (Figure  3);  control  flow  relations  specifying  the 
execution  sequence  of  statements  (Figure  4);  and  state  relations  specifying 
the  sets  of  conditions  and  resulting  actions  In  the  program  (Figure  5).  A 
fifth  kind  of  information  in  the  program  consists  of  the  detailed  operations 
themselves,  the  actions  corresponding  to  a  single  statement  or  less . ^ 

The  two  general  approaches  to  program  comprehension  (TS,  PK)  differ  In 
terms  of  the  kinds  of  relations  between  program  elements  that  are  hypothesized 
to  be  central  in  mental  representations.  Therefore,  the  two  approaches  lead 
to  differing  predictions  about  the  kind  of  Information  that  will  be  directly 
available  to  the  programmer  from  the  representation,  or  easier  to  Infer  from 
the  mental  representation.  The  TS  view  suggests  that  when  a  programmer 
studies  a  program,  the  meaning  is  built  up  from  the  bottom  in  terms  of  the 
operations  binding  together  into  the  control  flow  units  that  are  assigned 
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local  purpose  Major  function  and  goal  Information  Is  available  only  after 
these  relations  have  been  built  Thus  the  TS  view  stresses  detailed 
operations,  control  flow,  and  then  function  In  the  representational  hierarchy, 
suggesting  that  questions  about  detailed  operations  and  execution  sequence 
will  be  answered  more  easily  (faster  and  with  fewer  errors)  than  will 
questions  about  major  function  and  data  flow.  The  PK  view  suggests  that  when 
a  programmer  studies  a  program,  function  is  Inferred  Immediately  when  a 
programmer  Identifies  a  familiar  stereotypic  unit  and  the  operations  and  data 
objects  will  be  bound  to  the  role  slots  in  the  hypothesized  plan.  The  PK  view 
stresses  data  flow  and  functional  dependencies  as  central  In  the 
representations,  suggesting  that  function,  data  flow,  and  detailed  operation 
Information  should  be  available  In  that  order.  Neither  view  predicts  that 
state  Information  will  be  easy  to  extract  from  program  text  although  some 
languages  and  applications  ( A I  programming  in  LISP)  emphasize  these  relations 
Comprehension  questions  can  be  designed  for  a  program  text  that  ask 
specifically  about  these  different  kinds  of  relations  In  the  program 

Methods 

Subjects.  Professional  programmers  with  a  minimum  of  three  years  of 
professional  programming  experience  served  as  subjects  In  the  research. 
Subjects  were  selected  from  a  pool  of  over  400  programmers  who  volunteered  to 
participate  In  response  to  mall  solicitations  to  Data  Manager  Association 
members,  radio  and  television  announcements,  Chicago  newspaper  stories,  and 
approaches  to  several  Chlcago-area  businesses  and  research  Institutions.  Our 
choice  of  programming  languages  was  constrained  by  the  availability  of 
experienced  professionals  for  each  language.  Since  85%  of  the  volunteers 
programmed  primarily  In  COBOL.  FORTRAN,  and  ASSEMBLER,  subjects  were  drawn 
from  the  COBOL  and  FORTRAN  programmers.  This  provides  a  basis  for  examining 
the  generality  of  findings  across  the  two  languages  most  widely  in  use. 

A  total  of  80  professional  programmers  participated  in  this  study,  40 
COBOL  programmers  and  40  FORTRAN  programmers.^  Differences  between  FORTRAN 
and  COBOL  programmers  In  educational  level,  college  major,  number  of 
programming  languages  known,  and  number  of  years  programming  (but  not  number 
of  years  as  a  professional  programmer)  were  statistically  reliable  (g  <  .01 
level).  The  average  FORTRAN  programmer  was  37  years  old  at  the  time  of  the 
study,  male  (95%  of  the  sample),  had  majored  In  computer  science  or  other 
science/engineering  field,  had  completed  some  graduate  level  work  beyond  a 
bachelor's  degree,  knew  6  other  programming  languages,  had  taken  4  programming 
courses,  had  programmed  for  14.5  years,  had  been  a  professional  programmer  for 
10.8  years,  and  had  spent  an  estimated  12,306  professional  programming  hours 
on  program  coding,  debugging,  and  modification  tasks.  Forty-three  percent  had 
taught  at  least  one  programming  course.  The  average  COBOL  programmer  was  35 
years  old,  male  (77.5%  of  the  sample),  had  a  college  degree,  majored  In  social 
science  or  humanities,  knew  4  other  programming  languages,  had  taken  4 
programming  courses,  had  programmed  for  10.5  years,  had  been  a  professional 
programmer  for  9.5-years  and  had  spent  an  estimated  11,196  professional 
programming  hours  on  program  coding,  debugging,  and  modification  tasks. 
Forty-five  percent  had  taught  at  least  one  programming  course. 

Subjects  were  run  over  a  period  of  eight  months  from  July,  1983  to 
February,  1984.  Each  programmer  was  paid  $10.00  to  cover  transportation  and 
parking  costs. 

Materials .  The  program  segments  developed  for  the  research  were  drawn 
from  four  full  length  programs  currently  In  use  In  Chlcago-area  computer 
Installations,  The  four  programs  spanned  a  range  of  program  types:  a  batch 
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f  Lie  update  program,  an  engineering  application,  ati  ln[  in  .n:T  ive  pifu;ram,  at.; 
computational  program.  Two  of  these  programs  were  originally  written  in  CC?.<  ; 
and  two  in  FORTRAN.  Eight  program  segments  were  taken  from  the  programs  and 
modified  slightly  so  that  they  met  the  following  criteria:  1)  Each  comprised 

exactly  15  lines;  2)  each  accomplished  something  sensible  In  Isolation  (i.e. 
was  comprehensible  as  a  fragment  that  did  something  concrete);  3)  each 
contained  TS  and  PK  units  that  differed  in  content. 

One  of  the  program  segments  used  In  the  experiment  Is  shown  in  Figure 
1 . A .  This  particular  segment  is  unusual  because  It  Is  extremely  simple.  It 
was  Included  In  the  research  and  Is  offered  here  as  an  example  because  It  has 
been  used  in  many  other  studies  of  program  comprehension  and  Is  frequently 
used  as  an  example  in  published  articles  (e.g.,  Curtis,  et  al . ,  1984;  Solowav. 
et  al.,  1983).  Thus  analyses  in  the  present  study  can  be  compared  directly  to 
previous  research.  Figures  6, A  and  6.B  show  the  TS  and  PK  theoretical 
analyses  of  the  example  program  segment. 

For  each  of  the  8  program  segments,  6  comprehension  questions  were 
composed  that  varied  according  to  the  category  of  Information  about  program 
relations  to  which  each  pertained.  Examples  of  each  kind  of  question  for  the 
program  segment  shown  In  Figure  l.A  are:  Will  an  average  be  computed? 
(function);  Is  the  last  record  In  ORDER-FILE  counted  in  COUNT- CLI ENTS? 

( sequence ) ;  Util  the  value  of  COUNT- CLIENTS  affect  the  value  of  ACTIVE-AVG? 
(data  flow);  When  ORDER-EXIT  Is  reached,  will  ORDER-REC-ID  have  a  particular 
known  value?  (state);  Is  TOTAL-ORDERS  Initialized  to  zero?  (detailed 
operation) . 

Also  for  each  of  the  8  segments,  a  recognition  test  list  was  constructed 
to  examine  priming  effects  between  Items.  A  critical  target  item  was 
designated  along  with  two  primes  (a  TS  prime  and  a  PK  prime)  to  form  a  triple 
to  be  used  In  test  list  construction.  The  essential  feature  of  each  triple 
was  that  the  TS  prime  and  the  target  were  In  the  same  cognitive  unit  according 
to  the  TS  analysis  of  the  segment  but  In  different  PK  units  and  that  the  PK 
prime  and  the  target  were  In  the  same  cognitive  unit  according  to  the  PK 
analysis  of  the  segment  but  different  TS  units.  For  example,  statements  4, 

13,  and  2  (Figure  l.A)  form  a  triple.  Statement  4  is  the  TS  prime  because 
statements  4  and  2  are  in  the  same  TS  cognitive  unit  (Figure  6. A)  but  in 
different  PK  units  (Figure  6.B).  Statement  13  is  the  Pk  prime  because 
statements  13  and  2  are  in  the  same  PK  cognitive  unit  (Figure  6.B)  but  are  In 

different  TS  units  (Figure  6. A).  Statement  2  Is  the  target  item  since  it 

appears  In  both  pairs.  For  each  program  segment,  4  target  items  were 

identified  along  with  their  TS  and  PK  primes  and  the  remaining  3  lines  of  code 

wp re  designated  as  filler  Items.  The  targets  were  arbitrarily  divided  into 
two  sets  designated  as  A  Materials  and  B  Materials. 

There  are  other  bases  besides  roles  In  the  TS  and  PK  representations  on 
which  program  statements  might  be  associated  In  memory.  For  example,  the 
surface  distances  between  the  prime  and  target  statements  differ  for  the 
example  just  given.  In  addition,  some  program  statements  have  repeated 
arguments,  others  are  very  similar  syntactically,  and  the  direction  (forward 
or  backward)  between  the  prime  and  target  might  differ  within  a  triple.  It 
was  not  possible  to  hold  all  of  these  other  factors  constant  within  any  one 
triple  or  within  any  one  program  segment.  However,  the  four  potential 
influences  --  surface  distance,  argument  repetition,  syntactic  similarity,  and. 
direction  '  etween  prime  and  target  were  balanced  over  all  32  (8  segments,  four 
targets  per  segment)  TS-target  and  PK-target  pairs. 

Prime  and  target  items  were  embedded  In  a  recognition  test  list 
consisting  of  22  Items,  7  false  items  and  15  true  Items.  The  15  true  Items 


consisted  of  3  "filler*  true  items  and  6  prime-target  pairs  The  6  targets 
consisted  4  targets  presented  for  the  first  time  and  2  targets  presented  a 
second  time.  For  the  first-time  targets,  primes  were  paired  with  targets  so 
that  one  group  of  subjects  (within  each  language)  saw  PK  primes  lmreed l ate  1 v 
preceding  A-targets  in  the  recognition  test  list  and  TS  primes  immediately 
preceding  B-targets.  A  second  group  of  subjects  (within  each  language)  saw  Tri 
primes  immediately  preceding  A-targets  and  PK  primes  immediately  preceding  R- 
targets.  For  the  repeated  targets,  the  prime  not  seen  before  was  paired  with 
the  target.  False  Items  used  variable  names  that  had  occurred  in  the  segment 
but  were  connected  with  an  operation  that  had  not  connected  them  in  the 
segment.  False  Items  did  not  consist  of  tricky  misspellings  or  para  phrases  r: 
the  program  statements  since  they  were  not  intended  to  be  lures.  One  of  the 
false  items  was  a  repeated  item.  Test  lists  were  arranged  in  four  different 
orders  so  that  each  prime-target  pair  appeared  once  in  each  quarter  of  the 
list,  subject  to  the  restrictions  that  a  target  Item  could  not  be  placed  in 
the  first  or  second  position  of  the  test  list  and  that  primes  had  to 
Immediately  precede  their  targets.  All  program  segments  and  test  list  items 
were  prepared  In  two  programming  languages,  FORTRAN  and  COBOL.  COBOL  subjects 
saw  only  COBOL  segments  and  test  items;  FORTRAN  subjects  saw  only  FORTRAN 
segments  and  test  items. 

Frocedure .  Subjects  participated  In  a  single  experimental  session 
lasting  approximately  2.5  hours  at  the  University  of  Chicago,  Northwestern 
University,  or  their  place  of  business.  To  begin  the  session  an  experimenter 
showed  subjects  the  IBM  Personal  Computer  to  be  used  during  the  session  and 
pointed  out  distinctive  features  of  the  keyboard.  Thereafter,  all 
instructions  to  the  subjects  were  presented  via  the  computer  screen.  The 
first  part  of  the  session  consisted  of  general  task  instructions,  detailed 
instructions  concerning  the  use  of  editing  features  required  during  the  task, 
and  a  practice  trial. 

Subjects  were  told  that  they  would  study  a  15-llne  segment  of  code  for  a 
total  of  4.5  minutes,  divided  Into  three  1.5  minute  Intervals;  that  between 
the  study  Intervals  they  would  be  asked  to  respond  to  comprehension  questions 
and  would  be  given  a  memory  task.  They  were  instructed  that  their  primary 
task  was  to  come  to  a  complete  understanding  of  the  code  so  they  could  answer 
the  comprehension  questions  accurately.  Ue  emphasized  that  their  responses  to 
the  memory  tasks  should  follow  from  their  attempt  to  understand  the  code;  that 
attempts  should  not  be  made  to  memorize  the  text  or  use  special  strategies  for 
memorization.  The  exact  task  sequence  was  described  and  then  demonstrated  in 
the  practice  session. 

The  following  task  sequence  was  repeated  three  times  for  each  program 
Subjects  studied  a  15-line  segment  of  code  that  appeared  on  the  screen  for 
exactly  1.5  minutes.  Following  instructions  to  prepare  for  comprehension 
questions,  subjects  responded  to  each  question  by  pressing  "yes"  or  "no.  " 
Response  latencies  and  actual  responses  were  recorded  by  the  controlling 
program.  The  next  screen  announced  the  free  recall  section  and  subjects  typed 
in  as  much  of  the  15-line  segment  as  they  could  recall,  In  any  order  that  it 
occurred  to  them.'  The  third  study - comprehens ion  trial  ended  with  a 
recognition  memory  test  In  place  of  the  recall  task.  Recognition  started  with 
a  screen  reminding  subjects  to  position  their  fingers  correctly,  to  respond 
"yes"  or  "no"  as  quickly  and  accurately  as  possible,  and  not  to  pause  during 
the  list  presentation.  Subjects  initiated  the  test  with  a  keypress,  with 
subsequent  lines  triggered  by  the  previous  response.  The  response  and  the 
response  latency  for  each  item  were  recorded. 

The  three  study-test  trials  occurred  for  each  of  the  eight  program 
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segr nents  with  a  break  between  the  fourth  and  fifth  segments.  At  the 
conclusion  of  the  session,  subjects  filled  out  a  detailed  background 
questionnaire  and  responded  to  questions  posed  by  thp  experimenters  about 
their  reactions  to  the  experiment  and  their  own  programming  work. 

These  procedures  were  established  by  extensive  pilot  testing.  For 
example,  the  total  study  time  of  4.5  minutes  was  chosen  to  Insure  high  levels 
of  recognition  accuracy  and  moderately  high  level  of  segment  comprehension. 

The  comprehension  questions  were  Inserted  before  the  recall  and  recognition 
tasks  to  focus  subjects  on  the  comprehension  aspects  of  the  task  rather  than 
on  the  memory  requirements  and  to  discourage  Inclinations  to  rehearse  or 
retain  a  visual  Image  of  the  text. 

Peslgn.  The  program  segment  and  test  list  materials  were  used  to  form 
thp  basic  research  design:  2  (languages)  x  4  (orders)  x  2  (subject  groups 
within  language)  x  2  (TS ,  PK  prime  types)  x  2  (A,  B  sets  of  target  Items). 
Language,  order,  and  subject  group  were  between  subjects  factors,  and  subject 
groups,  prime  type,  and  materials  set  formed  a  2x2x2  repeated  measures  latin 
square.  In  this  design,  comparisons  between  target  response  times  for 
different  prime  types  will  be  a  within  subjects  comparison  but  for  different 
materials  sets.  A  rearrangement  of  this  design  using  first  and  second 
presentations  of  only  those  target  Items  that  were  repeated  In  the  test  lists 
allows  a  within  subject  comparison  between  Identical  target  response  times  for 
different  prime  tvpes.  This  comparison  Is  of  secondary  Interest  because 
repetitions  of  true  Items  In  the  test  list  are  potentially  confusing  and  may 
add  varlabllitv  to  response  times  for  these  items. 

This  design  provides  tests  of  whether  programmers’  mental  representat lntit 
of  program  text  reflect  structural  distances  hypothesized  by  the  TS  analysis, 
the  PK  analysis,  or  neither  analysis.  Specifically,  support  for  a  TS 
macrostructure  Is  obtained  If  response  times  to  targets  preceded  by  a  TS  prime 
are  reliably  faster  than  the  same  targets  preceded  by  a  PK  prime.  If  this  Is 
the  case,  we  can  Infer  that  the  Items  specified  bv  the  TS  analysis  as  forming 
a  cognitive  unit  are  In  fact  'closer"  In  memory  than  are  the  Items  specified 
hv  the  FK  analysis.  Alternatively,  support  for  a  PK  mac ros t rue ture  Is 
obtained  If  response  times  to  targets  preceded  by  a  PK  prime  are  reliably 
faster  than  the  same  targets  preceded  by  a  TS  prime.  Finally,  If  some 
response  times  to  PK-prlraed  targets  are  faster  and  other  response  times  to  TS 
primed  targets  are  faster,  then  no  Inferences  may  be  drawn  regarding  which  of 
the  formulations  more  accurately  portrays  the  nature  of  mental 
representations. 

Response  times  and  error  rates  for  different  kinds  of  comprehension 
questions  provide  an  additional  measure  regarding  relations  that  dominate  in 
mental  representations.  Specifically,  If  support  for  a  PK  macrostructure  is 
obtained  with  the  recognition  response  times,  then  we  expect  to  see  fewer 
errors  and  faster  response  times  for  function  and  data  flow  comprehension 
questions.  Alternatively,  If  support  for  a  TS  macrostructure  Is  obtained  with 
the  recognition  response  times,  then  we  expect  to  see  fewer  errors  and  faster 
response  times  for  detailed  operations  and  control  flow  comprehension 
quest  ions .  , 

Results 

Recognition  Memory  Data.  The  question  of  primary  interest  is  whether 
target  response  times  are  faster  (1)  when  the  target  is  Immediately  preceded 
by  an  Item  from  an  hypothesized  TS  cognitive  unit,  or  (2)  when  the  target  Is 
Immediately  preceded  by  an  Item  from  an  hypothesized  PK  cognitive  unit,  or  (3' 
there  would  be  no  difference  between  priming  conditions.  After  removing 


pc  ip  ■nt'  r  e  p  orise  f  ifTtp*;  and  response  tirtP.i  of  Inrnrrprr  i  r  #  •  r-  s  and  Items  r-ir 
which  a  •'rlrca  error  was  made  a  mean  was  computed  for  each  subject  for  each  of 
the  t-wo  target  sets  (A,  B)  .  These  means  were  analyzed  in  a  2x2x4x2  repeated 
measures  analysis  of  variance  with  language  (FORTRAN,  COBOL),  subject  group 
(1,  2),  and  test  list  order  ( 4  orders)  as  between  subjects  factors  and  type  of 

prime  ( PK .  TS  )  as  a  wlrhln  subject  repeated  measure.  A  second  2x2x2  analysis 
of  variance  was  performed  on  this  data,  treating  materials  as  a  random  factor, 
with  language  ( FORTRAN ,  COBOL),  and  materials  set  (A,  B)  as  between  Items 
factors,  and  prime  type  ( PK ,  TS )  as  a  within  Item  repeated  measure. 

Examination  of  the  cell  means  reveals  that  there  are  multiple  Influences  on 
target  response  times  (see  Table  2). 

Insert  Table  2  about  here 

As  predicted  bv  a  text  structure  (TS)  analysts  of  program  comprehension, 
responses  to  TS-primed  targets  are  on  average  105  milliseconds  faster  than 
responses  to  PK  primed  targets,  £(1,84)  -  4.51,  £  <  .04  (subjects  analysis, 
see  Table  2.  Part  A',  £(1,80)  -  3  22,  g  <  .06  (Items  analysis).  Considering 
onlv  subjects  whose  comprehension  scores  were  In  the  top  quartile  (since  thp.se 
subjects  had  a  more  complete  understanding  of  the  program  segments),  we  see 
(Table  2,  Fart  B)  that  the  TS-prlmed  speedup  Is  larger,  237  milliseconds, 
£•.1,15)  -  3  35,  g  <  02  i  subjects  analysis),  £(1.59)  -  3.60,  g  <  .06  (items 
aralvsis'  Comparisons  using  the  repeated  target  data  show  the  same  advantage 
for  75 -primed  targets  although  the  effect  for  rppeated  targets  is 
s - a  *  1  s t i ca 1 1 v  unreliable  due  to  Increased  variance  In  repeat  target  times. 

Target  response  times  also  differed  for  the  (arbitrarily  composed)  A  and 
B  materials  sets,  for  the  two  language,  and  for  the  subject  groups  within  each 
language.  Responses  to  B  materials  took  an  average  of  21  milliseconds  longer 
than  responses  to  A  materials,  £(1,64)  -  28.18.  g  <  .001  (subjects  analysis), 

£ • 1 ,60)  -  2  32.  2  <  0.14  (Items  analysis),  and  this  difference  was  larger  for 
CC?''L  Items  than  for  FORTRAN  Items,  £(1,64)  -  6.03,  g  <  .02  (subjects 
analysis,  not  significant  for  items  analysis).  COBOL  subjects  took  longer  to 
respond  In  general.  £ ' 1 .64)  -  4.91,  g  <  .04  (subjects  analysis),  £(1,60)  - 
4  58,  g  <  .04  (Items  analysis)  and  there  was  a  subject  group  within  language 
difference,  £(1,64)  -  5.29,  g  <  .03  (subjects  analysis),  £(1,60)  -  34.54, 
g  <  .001  (items  analysis).  Subject  group  differences  may  not  be  attributed  to 
experimental  manipulations  since  response  times  for  the  24  filler  true  items 
not  Involved  In  anv  exoe r lmental  manipulation  reveal  identical  differences 
between  language  and  subject  groups. 

The  array  of  effects  can  be  seen  more  easily  graphically  (see  Figure  7) 

In  Figure  7. A,  response  times  are  adjusted  for  the  effect  of  subject  groups 
within  lanuago,  showing  the  TS  priming  effect  for  both  languages,  the  effect 
for  materials  set  and  the  slight  Interaction  between  materials  and  language 
In  Figure  7.B.  means  are  adjusted  for  the  effect  of  materials  sets,  again 
showing  the  TS  priming  effect  for  both  languages  and  the  effect  for  subject 
group  within  language. 

Insert  Figure  7  about  here 

Overall  recognition  accuracy,  measured  b;  perc<-  t  correct,  averaged 
92.1%  FORTRAN  subjects  made  fewer  recognition  errors  (6%)  compared  to  COBOL 
subjects  (9.8%),  £(1,64)  -  7.36,  g  <  .01  but  recognition  accuracy  did  not 
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differ  for  subjects  assigned  to  the  two  e xpe I  1 rae n t a  1  conditions,  nor  were 
there  anv  differences  In  recognition  accuracy  due  to  order  of  presentation  of 
the  program  segments  (£s  less  than  1).  Accuracy  for  target  recognition  Items 
across  subjects  and  Items  averaged  92.6%  correct.  Error  rates  differed  bv 
language  (FORTRAN  5.8%.  COBOL  9.1%;  £{1,69>  -  9.19,  p  <  05)  but  did  not 

differ  by  experimental  condition  or  by  materials  set  (A-targets,  B-targets) 

(£s  less  than  1).  Correct  responses  averaged  2.670  seconds  compared  to 
Incorrect  responses,  which  averaged  3.667  seconds,  £(1,76)  -  63.60.  g  <  001, 

suggesting  a  difficulty  relationship  between  speed  and  accuracy  rather  than  a 
speed/accuracy  tradeoff.  Thus,  Interpretation  of  the  above  results  Is  not 
affected  by  error  rates  and  a  constant  error  rate  of  8-10%  may  be  assumed 

It  is  not  surprising  that  materials  and  subject  differences  account  for  « 
major  portion  of  variability  In  time  to  recognize  program  statements 
However,  on  top  of  these  differences,  program  statements  are  cons  1 s tent  1 v 
recognized  faster  when  immediately  preceded  by  a  program  statement  In  the  same 
control  flow  unit  (T3  analysis).  This  result  strongly  supports  the  mental 
organization  of  program  text  proposed  by  a  text  structure  analysis.  On  the 
basis  of  this  result,  we  expect  certain  converging  results  in  the  programmers' 
responses  to  comprehension  questions  about  these  same  program  segments. 

Comprehension  Data.  Our  main  Interest  In  comprehension  accuracy  Is  In 
differences  that  might  occur  between  Items  In  different  Information 
categories,  that  Is,  between  questions  asking  about  different  kinds  of 
Information  In  program  text.  Referring  to  the  earlier  text  analyses  (Figures 
2-5),  the  98  comprehension  questions  comprised  10  questions  about  detailed 
program  operations  (operations  questions),  9  questions  about  program  execution 
sequence  (control  flow  questions),  9  questions  about  program  data  flow  (data 
flow  questions),  10  questions  about  program  ccndl t Ion- ac t Ion  relations  (state 
questions),  and  10  questions  about  major  program  functions  (function 
questions) .  Ue  assume  that  higher  error  rates  for  questions  in  a  particular 
information  category  Imply  that  the  information  in  that  category  is  less 
easily  accessed  or  computed  from  the  memory  representation.  Under  this 
assumption  the  text  structure  (TS)  analysis  (Figure  6. A)  predicts  that 
operations  and  control  flow  questions  will  be  more  easily  answered  and  that 
data  flow  and  major  functions  will  be  more  difficult  to  Infer.  The  plan 
knowledge  (PK)  analysis  (Figure  6.B)  predicts  that  data  flow  and  major 
function  Information  will  be  most  accessible  with  operations  and  control  flow 
less  accessible.-*  State  Information  is  not  predicted  to  be  accessible  under 
either  formulation.  The  recognition  memory  results  discussed  above  lead  us  to 
expect  further  support  for  the  TS  formulation:  operations  and  control  flow 
questions  will  be  most  easily  answered. 

Error  rates  were  computed  for  each  subject  for  items  in  each  Information 
category  (percent  of  items  missed  in  the  category)  and  for  each  item  (percent 
of  subjects  missing  an  item).  The  five  scores  per  subject  were  submitted  to  a 
language  (FORTRAN,  COBOL)  by  information  category  (operations,  control  flow, 
data  flow,  state,  function)  repeated  measures  analysis  of  variance  (with 
subjects  as  random, factor)  and  the  item  scores  were  submitted  to  a  language  bv 
Information  category  analysis  of  variance  (with  Items  as  random  factor). 

Information  category  of  the  comprehension  question  affected  error  rates. 
£(9,312)  -  26.75,  g  <  .001  (subjects  analysis),  £(9,86)  -  2.87,  g  <  .03  (items 
analysis).  The  ordering  of  difficulty  of  the  information  category  questions 
was  predicted  by  the  text  structure  analysis;  Questions  about  detailed 
operations  and  control  flow  relations  were  answered  most  accurately  (15%,  21% 
errors  respectively)  while  more  errors  were  made  on  data  flow,  state,  and 
function  items  (28%,  30%,  39%  errors  respectively). 
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Overall  comprehension  levels  for  FORTRAN  and  COBOL  programmers  differed 
reliably  when  subjects  were  the  unit  of  analysis,  £(1,78)  -  6.01,  2  <  .02. 
although  with  Items  as  the  unit  of  analysis,  variability  among  Items  swamped 
this  difference,  £(1,86)  -  1.10.  In  addition  the  pattern  of  error  rates  for 
information  categories  differed  for  the  two  languages,  £ ( A , 3 1 2 )  -  8.72, 

2  <  .001  (subjects  analysis),  £(6,86)  -  .827  (not  significant,  items 
analysis).  The  comprehension  pattern  across  information  categories  for 
FORTRAN  subjects  (see  Figure  8)  was:  Questions  about  operations  and  control 
flow  were  answered  most  accurately,  questions  concerning  major  function  next 
most  accurately,  followed  by  data  flow  and  state  questions.  This  yielded  an 
inverted  U  shaped  pattern  with  operation,  sequence  and  function  depressed, 
showing  lower  error  rates  (Figure  8).  The  most  noticeable  difference  between 
the  FORTRAN  and  COBOL  patterns  was  the  elevated  error  rate  on  function 
questions  for  COBOL  subjects,  creating  an  increasing  pattern  across 
information  categories.  In  addition  to  the  elevation  of  major  function 
question  errors,  COBOL  data  flow  questions  showed  slightly  higher  accuracy. 

Insert  Figure  8  about  here 

it*******************-****** 

The  difference  in  patterns  could  be  due  to  one  or  more  of  three  factors 
First,  subjects  from  the  two  language  groups  were  not  equivalent  in  background 
characteristics .  Second,  the  languages  themselves  could  yield  differences  In 
ease  of  comprehension  such  that  control  flow  and  function  information  are 
easier  to  extract  from  FORTRAN  than  from  COBOL  text,  and  data  flow  information 
is  easier  to  extract  from  COBOL  than  from  FORTRAN.  Third,  the  pattern  of 
differences  might  be  due  to  lower  comprehension  levels  for  the  COBOL  subjects 
and  thus  the  COBOL  pattern  could  reflect  an  earlier  stage  in  the  comprehension 
process  and  the  FORTRAN  pattern  a  later  stage.  Under  the  third 
interpretation,  FORTRAN  subjects,  having  understood  more  of  the  operation  and 
control  flow  information,  may  have  progressed  further  and  could  therefore 
either  compute  or  retrieve  function  information  from  memory.  COBOL  subjects 
would  not  yet  have  reached  this  stage  in  the  comprehension  process  after  the 
allotted  study  time. 

The  first  explanation,  programmer  background,  is  not  supported  because 
these  variables  are  unrelated  to  comprehension  performance  on  our  task.  For 
example,  comprehension  accuracy  levels  were  equivalent  for  males  and  females, 
£(1,78)  -  1.32:  for  different  educational  levels,  £(3,76)  -  .64;  and  for 
different  college  majors,  £(3,71)  -  1.16.  This  is  important  because  it 
suggests  that  any  differences  in  performance  between  language  groups  will  not 
be  accounted  for  by  sample  differences  in  sex,  education,  and  college  major. 
There  were  also  no  differences  in  comprehension  accuracy  between  programmers 
who  had  taught  programming  and  those  who  had  not,  £(1,78)  -  .96. 

In  order  to  examine  the  other  two  explanations,  subjects  were  divided 
into  quartiles  (within  language)  on  the  basis  of  their  overall  comprehension 
accuracy  scores.  If  the  overall  FORTRAN  pattern  represents  a  later  stage  of 
comprehension,  then  the  upper  quartlle  COBOL  subjects  should  show  a  pattern 
more  like  the  FORTRAN  aggregate  and  the  lower  quartile  FORTRAN  subjects  should 
show  a  pattern  more  like  the  aggregate  COBOL  subjects.  Alternatively,  if  the 
differences  in  aggregate  patterns  reflect  fundamental  features  of  the 
language,  then  those  differences  will  appear  the  same  in  the  patterns  for  top 
and  bottom  quartile  comprehension  subjects.  The  comprehension  question  means 
across  Information  categories  for  top  and  bottom  quartile  subjects  are 
displayed  graphically  in  Figure  9. 
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Insert  Figure  9  about  here 
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Consistent  with  the  stage  of  comprehension  explanation,  bottom  quartlle 
subjects  showed  the  COBOL  aggregate  pattern  with  elevated  error  rates  for 
major  function  questions  and  top  quartlle  subjects  showed  the  Inverted  U 
pattern  of  the  FORTRAN  aggregate.  A  quartlle  by  Information  category 
Interaction,  the  statistical  manifestation  of  this  pattern,  is  only  marginally 
reliable,  F(4,64)  -  2.43,  p  <  .06  (subjects  analysis),  £(4,86)  -  2.02,  j>  <  .10 
(Items  analysis). 

Some  language  specific  features  are  also  retained  by  the  information 
category  patterns,  namely  lower  error  rates  for  control  flow  questions  for 
FORTRAN  subjects  (both  quartlle  groups)  and  lower  error  rates  for  data  flow 
questions  for  COBOL  subjects  (both  quartlle  groups).  This  is  consistent  with 
an  explanation  attributing  differences  to  language  specific  features  that 
affect  the  ease  of  extracting  information  from  the  text.  However,  as  in  the 
analysts  of  the  complete  subject  sample,  this  interaction  between  language  and 
information  category  shows  statistical  reliability  only  in  the  subjects 
analysis,  £(4,64)  -  3.99,  £  <  .006,  not  in  the  items  analysis,  £(4,86)  -  .619 

In  summary,  after  limited  study  time,  experienced  programmers’ 
comprehension  errors  were  strongly  related  to  the  kinds  of  inferences  required 
to  respond  to  the  question,  and  to  the  language  in  which  the  programs  were 
written  (Figure  8).  Questions  about  program  operations  and  control  flow 
relations  in  the  programs  were  answered  correctly  more  often  than  questions 
about  data  flow  relations  and  program  states.  Errors  were  made  most 
frequently  when  inferences  about  program  function  were  required.  This  general 
pattern  supports  the  TS  theoretical  formulation  (Figure  6. A).  Differences 
between  FORTRAN  and  COBOL  programmers  and  between  top  and  bottom  quartlle 
comprehenders  suggest  that  inferences  about  program  function  (what  it  does) 
are  the  most  difficult  and  appear  late  in  the  comprehension  process,  and  that 
there  are  probably  language  specific  features  that  affect  the  ease  of  certain 
kinds  of  Inferences  (Figure  9).  FORTRAN  programmers  were  consistently  better 
on  inferences  about  control  flow  while  COBOL  programmers  were  more  accurate  in 
responding  to  questions  about  program  data  flow  relations. 

In  addition  to  data  on  comprehension  errors,  summarized  above,  we 
analyzed  response  times  to  comprehension  questions,  providing  additional 
information  about  the  relative  difficulties  of  different  kinds  of  Inferences 
in  comprehension.  After  correcting  for  reading  time,  we  assume  that  a 
relatively  faster  response  time  to  a  question  Implies  less  processing  has  gone 
into  constructing  a  response  to  the  question.  Faster  responses  could  be  due 
to  direct  retrieval  of  the  information,  to  assessment  of  plausibility  given 
retrieval  of  higher  level  Information,  or  to  fast  computation  of  the  response 
given  retrieval  of  related  information.  Slower  responses  imply  extensive 
search  or  burdensome  computation  from  retrieved  information  (Reder,  1982; 
Glucksberg  6  McCloskey,  1981).  For  example,  If  major  function  information  is 
stored  directly  and  provides  the  macrostructure  (Klntsch  &  van  Dljk,  1978)  of 
the  text  representation  (Adelson,  1984;  Atwood  &  Ramsey,  1978;  Brooks,  1983; 
Shnelderman,  1980),  then  the  fastest  response  times  should  occur  for  major 
function  questions,  with  slower  responses  for  more  detailed  comprehension 
questions  like  those  in  the  operations  category.  However,  if  frequent  errors 
correspond  to  the  need  to  assemble  responses  at  the  time  of  questioning  then 
the  error  rate  data  above  imply  that  responses  to  operations  questions  will  be 
faster  than  responses  to  questions  about  program  function. 
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Response  times  {or  comprehension  questions  shout  different 
Infoimation  categories,  standardised  and  adjusred  for  question  length, ^ 
correspond  to  the  pattern  of  comprehension  errors  for  upper  quartlle  subjects 
(Figure  91.  Overall  correct  responses  to  true  statements  about  the  operatlots 
and  control  flow  of  the  program  were  answered  more  quickly  (mean  residuals 
were  -  33  and  -.16  respectively)  than  questions  ubout  data  flow  and  function 
(-.06.  -.11)  which  were  answered  morp  quickly  than  questions  about  program 

states  (e  06),  £(9,309)  —  8.06,  p  <  .001  (subjects  analysis),  £(9,86)  -  2  86. 
p  <  .03  (items  analysis)  Analyses  of  raw  response  times  did  not  differ  from 
analyses  of  residual  response  times  due  to  relatively  low  correlations  betwpp- 
question  length  and  response  time. 

The  response  time  data  and  the  error  data  for  the  comprehension  question,', 
support  the  following  conclusions:  (1)  Detailed  operation  Information  Is 
stored  directly,  organized  by  control  flow  units  (lowest  percent  errors, 
fastest  response  times).  (2)  Control  flow  inferences  are  thus  easv  to 
retrieve  or  compute  (low  percent  errors,  low  moderate  response  times).  (3) 
Some  data  flow  and  major  function  information  is  readily  available  (moderate 
response  times)  although  when  not  stored,  It  is  not  easily  computed  (high 
percent  errors).  (9)  Program  state  inferences  are  difficult  to  compute  (long 
response  times,  high  percent  errors). 

D iscusslon 

The  present  results  provide  evidence  that  the  dominant  memory 
representation,  formed  during  comprehension  of  short  program  texts  In  this 
experimental  context,  is  organized  by  a  small  set  of  abstract  program  units 
related  to  the  control  structure  of  the  program.  More  specifically,  of  the 
four  program  abstractions  presented  earlier  (Figures  2  through  5),  relations 
captured  by  the  procedural,  control  flow  abstraction  (Figure  2)  appear  to  be 
central  In  comprehension  In  our  experimental  task.  Furthermore,  the  nature  of 
the  mental  unitization  of  these  relations  corresponds  to  the  basic  program 
building  blocks  of  sequence,  iteration,  and  conditional  Identified  by  earlv 
advocates  of  structured  programming. 

Both  recognition  memory  results  and  comprehension  question  results 
converge  to  support  this  conclusion.  In  the  recognition  memory  test, 
recognition  occurred  faster  when  a  statement  was  immediately  preceded  bv  a 
statement  In  Che  same  text  structure  unit  than  when  It  was  Immediately 
preceded  by  a  statement  that  was  not  In  the  same  text  structure  unit.  This 
Implies  that  statements  In  the  same  TS  unit  were  closer  together  In 
programmers’  memory  structures.  This  priming  effect  cannot  be  accounted  for 
the  text  surface  distance  between  the  statements,  by  syntactic  similarity 
between  statements,  or  by  argument  repetition  since  these  features  were 
controlled  by  counter-balancing  test  Items.7  Furthermore,  responses  to 
comprehens Ion  questions  about  control  flow  relations  and  program  operations 
were  answered  faster  and  with  fewer  errors  than  were  questions  about  data  f i  ■. 
and  function  relations,  supporting  the  Idea  that  control  flow  and  operation 
information  Is  easier  to  access  In  memory. 

Examination  of  the  performance  of  programmers  with  the  highest 
comprehension  scores  strengthens  these  conclusions  and  leads  to  a  further 
speculation  that  segmentation  on  the  basis  of  control  flow  relations  occurs 
prior  to  comprehension  of  major  program  functions  and  data  flow  relations 
First,  the  priming  effect  for  TS  cognitive  units  was  strongest  for  top 
comprehenders .  Second,  top  comprehenders ’  fast  and  error  free  responses  to 
detail  and  control  flow  comprehension  questions  were  accompanied  bv  a 
disproportionate  decrease  in  response  errors  to  function  questions  While  ore 
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might  argue  1 ' n  the  basis  of  ttie  error  data  alone  that  rhere  is  a  priori-  v  ‘ 
function  Inf  oi  mat  Ion  t  n  top  comp  r  ehende  r  s  '  memory  represent  a  t  l  nns  t  tie  ■■- :  1  - 

results  and  response  time  data  undercut  such  a  conclusion 

These  empirical  results  fit  a  view  of  program  comprehension  In  which  '.he 
meaning  of  program  text  Is  developed  largely  from  the  bottom  up.  The  text  Is 
first  segmented  according  to  simple  control  patterns  segregating  sequences, 
loops,  and  conditional  patterns.  At  this  level  some  specific  Inferences  are 
made  concerning  the  procedural  roles  of  the  segments  For  example,  a  sequence 
at  the  beginning  In  which  zeros  are  assigned  to  variables  may  be  designated 
"Initialization  of  variables"  (see  Figure  6. A),  without  regard  for  the  role  of 
those  variables  In  later  computation.  Another  sequence  may  be  designated  as 
"something  Is  calculated."  A  loop  repeats  whatever  sequence  Is  contained 
within  It.  A  conditional  pattern  directs  control  to  alternate  sequences 

Data  flow  and  function  connections  often  require  Integration  of 
operations  across  separate  segments.  For  example,  calculation  of  an  average 
Involves  an  Initialization,  a  running  sum,  and  final  calculation  As  in 
Figure  6. A,  these  occur  In  three  separate  procedural  units  The  results 
suggest  that  these  connections  are  made  later  In  comprehension,  and  for 
programmers  with  the  lowest  comprehension  scores  they  are  not  made  correct iv 
or  at  all  within  the  time  limits  Imposed  bv  this  study. 

Several  alternative  views  of  program  comprehension  are  not  supported  bv 
the  research  results.  For  example,  views  based  on  strong  analogies  to  chess 
players'  perceptual  pattern  recognition  processes  are  not  supported  (cf., 
Creeno  &  Simon,  1989).  Patterns  In  program  text  that  are  recognized  quickly 
appear  to  be  general,  few  In  number,  and  are  discernible  through  syntactic 
markers  of  the  language  (programming  keywords).  Proposals  that  programs  are 
understood  Initially  through  recognition  of  program  plans,  from  an  expert's 
mental  library  of  hundreds  or  thousands  of  plans,  that  assign  roles  to 
configurations  of  program  statements  are  not  supported  by  this  research  (Rich 
&  Shrobe,  1979;  Soloway,  et  al.,  1983).  While  plan  knowledge  may  well  be 
implicated  In  some  phases  of  understanding  and  answering  questions  about 
programs,  the  relations  embodied  In  the  proposed  plans  do  not  appear  to  form 
the  organizing  principles  for  memory  structures.  Claims  that  data  flow 
relations  (Atwood  &  Ramsey,  1978;  Kant  f.  Newell,  1989;  Welser,  1982)  or 
function  hierarchies  (Adelson,  1989)  underlie  the  preferred  or  natural 
representation  of  programs  are  also  not  supported.  Our  results  suggest  that 
the  natural  representation  of  programs  is  procedural,  at  least  for  programs 
written  In  traditional  programming  languages.  The  best  comprehenders  In  the 
present  study  were  better  at  Inferring  function  relations  than  were  poorer 
comprehenders,  but  they  also  showed  stronger  effects  for  text  structure  memory 
organl za 1 1  on . 

A  final  result  concerns  the  Influence  of  programming  language  on 
comprehension  (Green,  1980;  Creen,  et  al . ,  1980).  There  Is  an  indication  In 
the  data  that  programming  language  may  affect  the  relations  represented  In 
initial  phases  of  comprehension  and  the  difficulty  of  extracting  different 
kinds  of  Information  from  the  program.  COBOL  programmers  were  consistently 
better  at  responding  to  comprehension  questions  about  data  flow  than  were 
FORTRAN  programmers.  Furthermore,  control  flow  relations  were  less  easily 
Inferred  by  COBOL  programmers.  This  may  be  due  to  features  of  COBOL  and 
FORTRAN  that  allow  FORTRAN  to  be  programmed  In  an  order  corresponding  more 
closely  to  actual  flow  of  control  from  statement  to  statement  (at  least  In 
these  short  segments).  In  COBOL  It  Is  more  usual  to  perform  loops  that  are 
listed  elsewhere  In  the  code.  Thus  the  surface  structure  of  the  test  in  COBOL 
corresponds  less  well  to  execution  sequence  than  In  FORTRAN  This  can  be  seen 
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bv  comparing  Figure  l.A  (program  text)  ami  Figure  *>  A  <7  aua'.vs!  .  In  w:,:  . 
the  loop  that  sums  a  list  of  numbers  Is  executed  in  statement  but  is 
.specified  In  statements  11  through  15.  In  tiie  FORTRAN  version  of  this 
segment,  the  loop  occurs  In  statements  5  through  9  of  the  code  and  is  executed 
when  encountered  there.  Thus,  there  is  some  evidence  that  disruption  of 
procedural  units  in  the  program  text  may  affect  comprehension  patterns 
Although  the  data  are  consistent  with  a  hypothesis  that  greater  difficulty  In 
extracting  procedural  text  units  is  related  to  great~r  difficulty  in 
extracting  program  function,  we  must  label  this  conclusion  tentative  because 
COBOL  and  FORTRAN  languages  and  programmers  differ  in  other  ways  as  well. 

It  Is  Important  to  question  the  extent  to  which  the  particular  task  used 
in  the  present  research  limits  generalization  of  these  results.  For  example, 
thp  conclusion  that  comprehension  has  a  more  bottom-up  character  and  Is 
organized  in  memory  by  procedural  control  constructs  may  be  specific  to 
understanding  small  program  texts  that  do  not  have  a  larger  context.  Even  if 
this  were  the  case.  It  would  not  sharply  dilute  the  Importance  of  the  present 
experiment.  First,  most  research  on  programming  skill  has  used  similar  short 
texts  Thus  findings  tn  this  skill  domain  must  be  reconciled  with  the  current 
results.  For  example  developers  of  the  plan  knowledge  theories  have  suggested 
that  expert  programmers'  recall  of  texts  like  these  Is  due  to  recognition  of 
program  plans  (Greeno  &  Simon,  1984;  Soloway,  et  al . ,  1983).  Of  course,  the 
present  results  suggest  that  this  Is  not  the  case;  that  chunking  in  recall  Is 
explained  by  the  grouping  of  statements  into  the  sequence,  loop  and 
conditional  text  units  suggested  by  structured  programming  advocates.  Second, 
the  program  segments  used  in  the  present  research  were  all  texts  that  were 
originally  embedded  within  larger  programs.  To  the  extent  that  the  larger 
context,  does  not  Illuminate  all  program  segments  equally,  the  kind  of 
processing  reported  tn  this  experiment  will  certainly  occur  In  actual 
programming  tasks.  However,  we  are  also  interested  In  an  empirical  analysis 
of  the  extent  to  which  the  first  study's  results  are  general  across  different 
programming  tasks  and  for  longer  programming  texts.  This  question  Is 
addressed  In  the  second  studv. 

STUDY  TWO 

In  tiie  first  study,  programmers'  comprehension  strategies  may  have  been 
Influenced  bv  several  aspects  of  the  experimental  task:  short  undocumented 
program  segments,  the  series  of  short  study  trials,  and  the  demands  of  memory 
questions.  In  Study  Two,  a  more  natural  programming  environment  was  created 
In  which  programmers  studied  a  program  of  moderate  length  (200  lines)  and  then 
made  a  modification  to  it.  At  two  different  points  In  time  they  were  asked  to 
summarize  the  program  and  respond  to  comprehension  questions.  Half  of  the 
programmers  were  asked  to  think  aloud  while  they  worked  and  the  other  half 
worked  silently. 

As  In  the  previous  study,  comprehension  questions  were  designed  to  ask 
about  particular  relations  between  program  parts:  control  flow,  data  flow, 
function,  arid  cond 1 1 Ion - ac t ion  relations.  If  the  results  of  the  previous 
studv  generalize  to  this  task  environment,  then  we  expect  to  see  good 
comprehension  of  control  flow  relations  early  In  the  comprehension  process 
with  comprehension  of  data  flow  and  function  catching  up  later  in  the  process 
Alternatively,  data  flow  and  function  inferences  may  be  made  more  readily  at 
the  outset  due  to  the  larger  context  In  the  Study  Two  program  text. 


Suh 1  PC  t  s .  Forty  of  the  80  professional  programmers  who  participated  In 
the  previous  study  were  Invited  to  return  for  the  second  study.  These  40 
subjects  Included  20  COBOL  and  20  FORTRAN  programmers  and  were  those 
programmers  who  had  scored  in  the  top  and  bottom  quartlles  in  the 
comprehension  task  In  the  previous  study.  Subjects  were  run  over  a  period  of 
six  months  from  September  1984  to  February  1985.  Each  programmer  was  paid  a 
$50.00  fee  for  participation. 

Materials .  The  stimulus  program  used  for  this  research  la  a  200-llne 
program  currently  In  production  use  at  a  Chicago  firm.  The  program  was  one  of 
a  series  of  programs  that  keeps  track  of  and  computes  specifications  for 
Industrial  plant  designs.  Originally  written  In  COBOL,  the  program  Includes 
both  file  manipulation  and  computation.  The  text  was  easily  translated  Into  a 
believable  FORTRAN  program.  The  program  contained  a  minimal  amount  of 
documentation  as  in  the  original  production  version  of  the  program.  The 
documentation  Included  an  Introductory  set  of  comments  describing  the  program 
as  one  chat  keeps  track  of  the  space  allocated  for  wiring  (called  cables 
below)  and  the  wiring  assigned  to  that  space  during  the  design  of  a  building. 
No  documentation  was  included  within  the  COBOL  text  but  the  FORTRAN  version 
contained  one-line  comments  corresponding  to  COBOL  paragraph  headers.  Thus 
the  level  of  documentation  In  the  two  versions  was  judged  to  be  equivalent 
with  the  na tural ly - occurr lng  exception  that  variable  names  were  shorter  In 
FORTRAN . 

A  modification  task  was  devised  that  required  altering  the  program  to 
produce  an  additional  output  file  and  an  exception  list.  As  with  most  non¬ 
trivial  modifications,  thfs  task  required  a  relatively  complete  understanding 
of  the  goals  of  the  original  program  (function),  how  different  variables 
entered  Into  computations  and  outputs  (data  flow),  and  where  In  the  execution 
sequence  certain  transformations  occurred  (control  flow). 

A  list  of  40  comprehension  questions  was  constructed  that  Included  10 
questions  about  control  flow  (e.g..  Is  a  point  number  grouping  processed 
normally  when  a  type  code  for  a  cable  is  not  found?),  10  questions  about  data 
flow  (e.g.,  Does  the  value  of  TPR-UIDTH  Influence  the  value  of  DES IGN - INDEX 
for  a  particular  point  number?),  10  questions  about  function  (e.g.,  Is  a 
report  created  with  point  numbers  that  exceed  a  DESICN- INDEX? ) ,  and  10 
questions  about  program  states  (e.g..  When  the  end  of  the  POINT-INDEX  file  Is 
reached,  can  there  be  records  in  the  TEMP- EXCEED  file  that  have  not  yet  been 
read?).  Half  of  the  questions  were  correctly  answered  with  a  ’yes’  response 
and  half  with  a  “no"  response.  The  fortv  questions  were  divided  Into  two 
matching  lists  of  twenty  questions.  For  a  question  on  the  first  list,  a 
similar  question  was  included  on  the  second  list  so  that  the  two  lists 
contained  comparable  questions.  The  lists  were  arranged  In  a  single  random 
order . 

Procedure .  Subjects  participated  In  one  experimental  session  lasting 
approximately  2.5  hours  at  the  University  of  Chicago,  Northwestern  University, 
or  their  place  of’business.  Subjects  were  familiar  with  the  IBM  personal 
computer  used  during  the  session  since  all  subjects  bad  participated  In  the 
previous  study.  All  Instructions  were  presented  on  the  display  monitor.  Thp 
first  part  of  the  session  consisted  of  general  task  Instructions  and  detailed 
Instructions  concerning  the  method  of  displaying  and  altering  the  program 
text,  Including  practice  manipulating  a  program  listing  using  these  features 

Programmers  were  Instructed  that  they  were  to  make  a  modification  to  a 
program  normally  maintained  by  another  programmer.  However  the  ’other 
programmer”  was  going  on  vacation  and  the  modification  was  urgent.  The 


subjects'  task  then  was  to  become  familiar  with  the  program  and  to  make  the 
changes  to  it.  Furthermore,  the  hypothetical  other  programmer  had  left  the 
program  with  the  subject  to  study  and  would  return  in  45  minutes  to  explain 
the  modification  task.  Subjects  accepted  this  scenario  as  realistic  and 
meaningful.  Thus  In  the  study  phase  programmers  studied  the  200- line  program 
for  4  ci  minutes.  Half  of  the  subjects  were  Instructed  to  think  aloud  (Talk 
Condition)  while  they  studied  and  the  remaining  half  were  allowed  to  study 
silently  (Notalk  Condition). 

The  program  text  was  presented  on  the  computer  display  and  subjects  could 
scroll  forward  or  backward,  Jump  to  another  place  In  the  program,  spilt  the 
screen  into  halves  and  scroll  either  half.  Subjects  were  also  allowed  to  take 
notes  or  draw  diagrams  while  studying  the  program.  Most  of  the  programmers 
were  familiar  with  studying  programs  on  a  terminal  but  for  those  who  were  not. 
the  split  screen  feature  served  the  purpose  of  keeping  a  finger  In  a  listing 
and  jumping  ahead  In  the  listing.  The  program  controlling  the  experiment  kept 
track  of  the  programmer's  study  sequence  by  recording  which  program  line  was 
In  the  center  of  the  display  screen. 

After  the  45  minute  study  period,  programmers  were  asked  to  type  in  a 
summary  of  the  program  and  then  to  respond  to  the  first  list  of  20 
comprehension  questions.  In  responding  to  the  comprehension  questions, 
programmers  positioned  their  fingers  on  "yes"  and  "no"  response  kevs .  They 
had  been  instructed  to  respond  yes  or  no  as  quickly  and  accurately  as 
possible.  On  presentation  of  a  question,  subjects  responded  and  then  were 
given  an  opportunity  to  explain  their  responses.  They  then  positioned  their 
fingers  to  receive  the  next  comprehension  question. 

After  the  20  comprehension  questions  and  a  short  break,  the  modification 
task  was  explained  to  the  programmer  and  a  time  limit  of  30  minutes  was 
specified.  Subjects  were  told  that  they  should  begin  actual  modifications  a" 
anv  point  when  they  felt  ready.  If  necessary  the  full  30  minutes  could  be 
spent,  continuing  to  learn  about  the  program.  However,  all  programmers  had  at 
least  begun  to  make  modifications  by  the  end  of  the  period  and  many  had 
completed  their  changes.  During  the  30  minute  modification  phase,  the  Talk 
Condition  subjects  were  again  asked  to  think  aloud  while  they  worked  and  the 
Notalk  subjects  were  permitted  to  work  silently. 

The  session  concluded  with  a  second  request  to  summarize  the  program  and 
then  to  respond  to  the  second  list  of  20  comprehension  questions.  The 
procedure  lor  these  tasks  was  the  same  as  before.  The  controlling  program 
recorded  all  responses,  explanations,  and  times  to  respond. 

f.'hTiiHl  Comprehension  question  responses  form  the  focus  of  the  analyses 
for  the  current  report  In  the  following  research  design:  2  (COBOL,  FORTRAN 
languages)  x  2  (01.  Q4  comprehension  quartiles)  x  2  (Talk,  Notalk  Conditions) 
x  2  (comprehension  test  lists  after  study,  after  modification)  x  4  (control 
flow,  data  flow,  state,  and  function  Information  category  of  comprehension 
questions).  Language,  comprehension  quartile  and  the  Talk/Notalk  Condition 
were  between  subjects  factors;  time  of  test  and  Information  category  of  the 
comp mhens i on  questions  were  repeated  measures  within  subjects. 


P  e  r  ■  1 1  -  s 

Analyses  of  the  proportion  of  errors  in  response  to  comprehension 
questions  about  different  kinds  of  program  relations  reveal  a  pattern  of 
errors  that  varies  across  information  category  according  to  time  of 
comprehension  test  and  taik-aloud  condition  of  the  programmer.  Conditions  at 
rhe  first  time  of  test,  after  the  45  minute  study  phase,  are  most  directly 
comparable  to  the  conditions  of  comprehension  testing  In  Study  One.  For  this 


reason,  results  are  presented  separately  below  for  comprehens !  on  after  r'l.e 
study  phase  and  afrer  the  modification  phase.  In  addition,  analyses  of 
program  summaries  are  available  for  summaries  collected  after  the  study  phase 
Summaries  collected  after  modification  could  not  be  analyzed  In  the  same  ter-, 
because  programmers  tended  to  refer  to  their  earlier  summaries  and  then  to 
concentrate  on  describing  their  modifications  rather  than  giving  complete 
program  summaries  as  Instructed. 


After  45  minutes  of  study,  the 


comprehension  pattern  for  comprehension  questions  about  control  flow,  data 
flow,  program  state,  and  function  relations  resembles  the  comprehens 1  on 
pattern  observed  for  Study  One.  with  questions  about  control  flow  answered 


most  accurately,  followed  closely  by  data  flow.  Errors  on  function  questions 
and  program  state  questions  are  relatively  more  frequent,  £(3.96)  -  11.64. 

E  <  001  (see  Figure  10).  This  pattern  across  Information  categories  did  not 

differ  reliably  by  language,  quartlle  or  talk-aloud  condition  <£s  less  than 
2).  Upper  and  lower  quartlles  differed  in  overall  level  of  comprehension, 
£(1,32)  -  15.75,  2  <  .001,  with  upper  quartlle  subjects  making  approximately 
40%  errors  and  lower  quartlle  subjects  making  60%  errors.  These  error  rates 
are  high  in  part  because  they  have  been  corrected  for  guessing  by  using  the 
explanations  provided  by  the  programmer  to  determine  comprehension. 
Uncorrected  error  rates  averaged  25%  for  upper  quartlle  subjects  and  3H  for 
lower  quartlle  subjects.  Analyses  performed  on  uncorrected  error  rates 
yielded  the  same  results  as  those  performed  on  error  rates  corrected  for 
guessing . 


Insert  Figure  10  about  here 


Program  summaries  were  analyzed  by  classifying  each  summarv  statement 
according  to  the  kind  of  program  relation  to  which  It  referred  and  according 
to  the  level  of  detail  specified  In  the  statement.  The  first  c las s 1 f 1 c at! on 
Is  referred  to  as  the  type  of  summary  statement  in  terms  of  Information 
categories;  types  included  procedural .  data  flow,  and  function  statements 
These  distinctions  are  best  Illustrated  by  the  following  excerpts  from 
summaries.  Procedural  statements  Include  statements  of  process,  ordering,  a 
conditional  program  actions.  The  summary  of  SI09  consisted  of  raostlv 
procedural  statements, 

’...after  this,  the  program  will  read  In  the  cable  file,  compa  ring 
against  the  previous  point  of  cable  file,  thpn  on  equal  condition 
compares  against  the  Internal  table... if  found,  will  read  the  trav- 
area-point  file  for  matching  point-area  In  this  read  If  found. 

Will  create  a  type - po Int - Index  record.  If  not  found,  will  read 
another  cable  record..." 

Data  flow  statements  also  Include  statements  about  data  structures.  f  .  1  t 
wrote  a  summary  that  contains  references  to  many  data  flow  relations. 

."The  tray-poljit  file  and  the  trav  area  file  are  combined  to  create  a 
tray  -  area  -  point  file  In  phase  one  of  the  program.  Fhase  two  tables 
Information  from  the  type-code  file  In  working  storage.  The 
parameter  file,  cables  file,  and  the  t ra v - a rea - po 1 n t  file  are  then 
used  to  create  a  temporary  -  exceed  -  1 ndex  file  and  a  point- Index 
file.  .  .  - 

S057  wrote  a  summary  that  contains  manv  function  statements, 

"...the  program  is  computing  area  for  cable  accesses  throughout  a 
building.  The  amount  of  area  per  hole  is  first  determined  and  then 


a  table  for  c.ibl*";  ami  diameters  Is  loaded.  hex'  a  at'e  :s 

lead  ro  accumulate  the  sum  of  the  cables’  diameters  r,1'  i  n  ^  rbi'"i>;b 

ea<.  h  ho  1 p  .  . 

The  examples  above  also  dlffpr  In  the  level  of  detail  contained  In  the 
summaries,  the  second  dimension  on  which  we  classified  summary  statements 
Four  levels  of  detail  were  specified  for  coding:  (a)  detailed  statements 

contained  references  to  specific  program  operations  and  variables;  (b)  r r '  e r 
level  statements  referred  to  a  program's  procedural  blocks  such  as  a  aearch 
routine  or  to  flies  as  a  whole;  (c)  doraa 1 n  level  statements  talked  about  re. 
world  objects  such  as  cables  and  buildings;  and  (d)  y a eve  statements  did  not 
have  specific  referents.  The  excerpts  presented  above  were  also  chosen 
because  they  differ  in  the  predominant  level  of  detail.  SlQd's  procedural 
summary  is  most  detailed;  S4l5’s  data  flow  statements  are  at  a  program  ( f 1 1 
level;  and  S057's  function  statements  are  at  a  domain  level.  An  example  of 
vague  statement  Is,  "this  program  reads  and  writes  a  lot  of  files." 

The  foregoing  examples  were  chosen  for  Illustrative  purposes  because  t 
contained  a  concentration  of  particular  types  of  statements  at  a  particular 
level  of  detail.  Host  summaries  contained  a  mixture  of  statement  types  and 
levels  but  can  be  summarized  in  terms  of  general  trends  across  subjects  and 
comparisons  can  be  made  between  languages,  comprehens ton  quartiles,  and  tal 
aloud  conditions. 

In  terms  of  statement  type .  the  majority  (57%)  of  programmers'  summary 
statements  were  classified  as  procedural,  30%  were  data  flow/data  structure 
statements,  and  13%  were  function  statements,  £(2,64)  -  2  ^ .  31,  p  <  .  C01.  7 

pattern  did  not  differ  by  quartlle,  by  language,  or  by  taik-aloud  condition 
In  terms  of  the  level  of  detail .  classifying  the  same  100%  of  the  summary 
statements  In  a  second  way,  the  predominant  level  was  the  program/f 1 le  leve 
accounting  for  38%  of  the  statements,  18%  of  the  statements  were  detailed, 
were  specified  at  the  domain  level,  and  21%  were  vague,  £(3.96)  -  10.47, 
p  <  .001.  This  pattern  across  level  of  detail  differed  for  upper  and  lower 
quartlle  subjects,  £(3,96)  —  4.65.  p  <  .01,  with  lower  quartlle  comprehende 
summaries  containing  relatively  more  statements  at  a  detailed  level  (20%  r.-* 
versus  16%  Ql)  and  more  statements  at  a  vague  level  (30%  Q4  versus  14%  Q1  ) 
final  observation  concerns  a  relation  observed  between  summary  statement  tv 
and  level.  A  majority  of  program  summary  statements  about  program  function 
were  expressed  In  the  language  of  real-world  objects  (cables,  space,  crowd* 
etc.)  rather  than  In  the  language  of  programs.  The  majority  of  procedural 
summary  statements  were  expressed  In  terms  of  program  objects  (files, 
computations,  searching,  etc.)  rather  than  In  the  domain  language 

Comprehens Ion  after  modi f lc at l^n  phase.  1-ooklng  again  at  comprehens  I  - 
errors  for  different  Information  categerles ,  the  comprehension  pattern  sh.lf 
on  the  second  comprehension  test  aftpr  the  modification  task,  £(3,<J6'i  -  ■*  " 

p  <  .001  (comprehension  trial  bv  Information  category  Interaction  In  an  AN 
treating  comprehension  trial  as  a  repeated  measure).  The  pattern  of  pi  tor-, 
for  this  trial  (see  Figure  11)  shows  the  fewest  errors  for  data  flow  and 
function  questions  with  more  errors  on  control  flow  questions, 

£(3.96)  -  14,85,  e  <  .001.  Furthermore,  this  pattern  Is  more  pxagr/rfl'f.j  f 
the  progr fl rrjn ?r s  who  talked  aloud  while  working,  £(3,96)  —  5.^3,  2  ^  .0»}1  1  s 
Figure  11).  Although  second  trial  patterns  are  exaggerated  for  Talk  suMe-.- 
overall  comprehension  accuracy  for  Talk  and  Notalk  subjects  was  roughlv 
equivalent,  £(1,32)  -  1.01.  Patterns  of  errors  across  Information  category 
did  not  differ  hv  comprehens ion  quartlle  or  bv  language. 


Insert  Figure  11  about  here 
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Discussion 

Comprehension  results  from  Study  Two  (Figure  10)  reinforce  and  extend  the 
conclusions  from  Study  One  that  the  understanding  of  program  control  flow  and 
procedures  precedes  understanding  of  program  functions.  This  pattern  of 
comprehension  results  appeared  even  In  the  context  of  a  longer,  partially 
documented  program  after  a  lengthy  study  period.  Analyses  of  program 
summaries  also  support  this  conclusion  by  showing  a  preponderance  of 
procedural  summary  statements  over  data  flow  and  function  statements, 
i  It  Is  Important  to  note  that  the  story  of  program  comprehension  does  not 

end  with  the  establishment  of  a  procedural  representation.  In  Study  Two  a 
different  comprehension  pattern  emerged  after  a  second  exposure  to  the  program 
during  which  programmers  completed  a  program  modification  (Figure  11).  After 
the  modification  task,  there  was  a  marked  shift  toward  Increased  comprehension 
of  program  function  and  data  flow  at  the  apparent  expense  of  control  flow 
Information  and  this  shift  was  more  extreme  for  programmers  who  t ere  asked  to 
think  aloud  while  working.  This  suggests  that  either  the  additional  time  or 
the  goal  of  modifying  the  program  resulted  in  a  change  in  the  dominant  memory 
representation  The  fact  that  talking  aloud  while  working  enhanced  this  shift 
suggests  that  task  effects,  rather  than  the  extra  time  alone,  are  responsible 

One  way  to  understand  this  shift  in  comprehension  patterns  Is  to  go  back 
to  theories  of  text  comprehension  and  speculate  about  a  construct,  Introduced 
bv  van  Dijk  and  Klntsch  (1*583),  that  they  call  a  situation  model.  In  this 
(1983)  work,  van  Dljk  and  Klntsch  suggest  that  two  distinct  but  cross- 
referenced  representations  of  a  text  are  constructed  during  comprehension. 

The  first  representation,  the  textbase .  Includes  the  hierarchy  of 
representations,  described  in  the  Introduction  to  the  present  paper, 
consisting  of  a  surface  memory  of  the  text,  a  microstructure  of  interrelations 
between  text  propositions,  and  a  macrostructure  that  organizes  the  text 
rep  resen ta t ion .  The  second  representation,  the  situation  model  Is  a  mental 
model  (eg.,  Johnson-Lalrd,  1983)  of  what  the  text  Is  about  ref erentlally .  In 
our  context,  the  program  text  in  Study  Two  Is  conceptually  about  searches, 
merges,  computations,  and  so  forth;  referentially ,  It  Is  about  cables  that 
take  up  space,  finding  out  how  big  a  particular  cable  is,  computing  the  total 
size  of  the  cables  allocated  to  a  particular  space,  comparing  the  cable 
allocation  to  the  size  of  the  space,  etc.  It  Is  plausible  that  the  functional 
relations  between  program  procedures  are  more  comprehensible  In  the  terms  of 
the  real  world  objects.  Thus,  the  textbase  raacrostructure  may  be  dominated  bv 
procedural  relations  that  largely  reflect  how  programs  in  traditional 
languages  are  structured.  The  functional  hierarchy  can  be  developed  with 
reference  to  a  situation  model  expressed  In  terms  of  the  real  world  objects. 
Data  from  our  analysis  of  program  summaries  are  consistent  with  this  idea. 
Procedural  summary  statements  were  most  often  expressed  in  terms  of  program 
concepts  and  functional  summary  statements  were  most  often  expressed  in  terms 
of  the  real  world  object  domain. 

Van  Dljk  and  Klntsch  (1983)  also  suggest  that  the  construction  of  the 
i  situation  model  depends  on  construction  of  the  textbase  in  the  sense  that  the 

textbase  defines  the  actions  and  events  that  need  explaining.  This  is 
consistent  with  our  findings  in  both  studies  that  procedural  representations 
i  precede  functional  representations.  In  fact  our  results  suggest  that  both 

^  time  and  Incentive  (talking  aloud  to  an  experimenter  and  having  to  do  a 
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m<nl  1  f  1  cat  Ion  >  ar  e  Involved  in  the  successful  construction  of  a  functionally 
hased  situation  model.  If  this  analysis  Is  correct,  we  could  Imagine 
conditions  that  might  assist  and  speed-up  the  extraction  of  program  function 
and  the  construction  of  a  functional  representation.  For  example, 
documentation  concerning  the  real  world  domain  and  the  relation  of  program 
procedures  to  the  domain  might  promote  a  simultaneous  construction  of  both 
kinds  of  understanding. 

One  final  aspect  of  the  results  of  Study  Two  deserves  comment. 
Comprehension  quartlle  as  determined  by  comprehension  scores  In  the 
experimental  setting  of  Study  One  predicted  the  comprehension  scores  In  the 
more  natural  task  of  Study  Two.  However,  the  error  rates  on  comprehension 
questions  for  both  upper  and  lower  quartlle  comprehenders  were  quite  high  in 
Study  Two,  even  after  1.25  hours  of  study  and  modification.  Is  this  cause  for 
practical  concern,  considering  the  fact  that  we  are  studying  professional 
programmers  with  an  average  of  10  years  of  experience  --  people  who  are 
responsible  for  the  programs  that  help  design  buildings,  monitor  space 
programs,  keep  track  of  bank  balances,  control  defense  systems  and  so  on?  The 
high  error  rates  are  not  by  themselves  cause  for  concern  because  programmers 
were  answering  questions  without  reference  to  the  program  listing.  It  does 
not.  necessarily  follow  that  the  same  errors  would  be  made  if  subjects  could 
have  "looked  up"  the  answers  In  the  program.  Greater  concern  would  be 
warranted  we  found  that  the  high  error  rates  were  accompanied  by  great 
confidence  In  level  of  understanding,  a  measure  we  did  not  collect.  But,  our 
casual  observations  of  subjects  who  talked  while  working  suggest  that  this  may 
have  been  the  case  for  some  of  the  programmers. 

CENERAL  DISCUSSION 

At  the  outset  we  presented  an  analysis  of  computer  program  texts  in  terms 
of  multiple  abstractions  of  the  text  to  Illustrate  different  relations  between 
parts  of  programs.  Specific  abstractions  expressing  important  relations  in 
the  design  of  computer  programs  include  a  goal  hierarchy  highlighting  major 
functional  achievements  of  a  program  (Figure  2),  a  data  flow  abstraction 
highlighting  the  transformations  that  are  applied  to  data  objects  (Figure  3), 
a  control  flow  abstraction  highlighting  the  temporal  sequence  of  execution  of 
program  actions  (Figure  4),  and  a  condl t lonal Ized  action  representation 
specifying  states  of  the  program  and  the  actions  invoked  (Figure  5).  Although 
this  specific  analysis  is  specific  to  computer  programming,  analogies  can  be 
developed  for  closely  related  tasks  that  Involve  other  kinds  of  texts  such  as 
Instructional  texts,  and  more  distant  analogies  for  design  task  In  which  other 
kinds  of  relations  are  more  central. 

The  views  of  computer  program  comprehension  contrasted  throughout  this 
report,  based  on  analyses  of  plan  knowledge  (PK)  and  text  structure  knowledge 
(TS),  represent  claims  about  which  kinds  of  relations  outlined  In  the  multiple 
abstractions  analysis  play  a  central  organizing  role  In  program  comprehension. 
PK  theory  suggests  that  data  flow  and  function  relations  will  be  dominant  and 
T3  theory  suggests  that  control  flow  or  procedural  relations  will  be  central 
More  generally,  these  two  views  represent  positions  about  the  role  of 
different  kinds  of  knowledge  In  comprehension.  TS  theory  emphasizes  the  role 
of  abstract  knowledge  of  program  text  structures  while  PK  theory  emphasizes 
the  role  of  a  large  collection  of  content -dependent  knowledge  that  links 
specific  program  functions  to  plans  that  achieve  them. 

The  present  research  results  strongly  support  a  view  of  program 
comprehension  In  which  abstract  knowledge  of  program  text  structures  plays  the 
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Initial  organizing  role  in  memory  for  programs,  and  that  control  flow  or 
procedural  relations  dominate  In  the  macrostructure  memory  representation 
These  results  are  consistent  with  conclusions  reached  by  researchers  In  other 
text  comprehension  domains  who  suggest  that  knowledge  of  narrative  and 
expository  text  structures  guides  comprehension  processing  and  plays  an 
Important  role  above  and  beyond  other  content  schematic  factors  (e.g,. 

Carpenter  (i  Just,  1981;  Clrllo  61  Foss,  1980;  Haberlandt,  Berlan,  &  Sandson, 
1980;  van  Dljk  &  Klntsch,  1983;  Johnson  &  Handler,  1980;  Kieras,  1985; 

Handler,  1978,  1989;  Handler  &  Johnson,  1977;  Rumelhart,  1975,  1980;  Stein  & 
Glenn,  1979;  Thorndyke,  1977).  These  results  are  not  consistent  with 
conclusions  suggesting  such  knowledge  Is  not  Involved  In  comprehension  In 
domains  where  extensive  content  knowledge  may  be  available  (e.g.,  Black  & 
Bower,  1980;  Black  &  Wilensky,  1979;  Bruce,  1980;  Schank  &  Abelson,  1977; 
Thorndyke  &  Yekovlch,  1980).  Thus  as  there  Is  good  evidence  that  "episodes" 
function  as  psychological  units  in  story  comprehension,  there  is  also  good 
evidence  that  structured  programming  building  blocks  function  as  psychological 
units  in  program  comprehension. 

In  terms  of  the  multiple  abstractions  analysis,  programmers'  mental 
representations  In  this  research  were  closest  to  the  procedural  representation 
(Figure  9)  based  on  control  flow  relationships.  Should  we  then  conclude  that 
a  procedural  form  Is  the  "natural"  mental  representation?  In  the  current 
research,  we  originally  expected  that  mental  representations  would  show 
function  and  data  flow  relations  to  be  primary.  If  that  had  occurred,  then 
there  would  be  ample  ground  to  claim  that  these  relations  reflected  a 
"natural"  or  preferred  cognitive  organization  because  text  and  language 
structure  as  well  as  the  programmers’  training  combine  to  highlight  procedural 
relations.  However,  given  the  current  results,  we  are  not  sure  whether  the 
mental  organization  reflects  language/text  structure  and  training,  or 
cognitive  "naturalness,"  or  both.  There  is  some  evidence  from  research  on  the 
comprehension  of  procedural  instructions  that  the  memory  structure  reflects 
procedural  relations  rather  than  functional  relations  whether  or  not  the  text 
from  which  the  procedure  is  learned  has  a  procedural  form  (Smith  &  Spoehr, 
1989).  On  the  other  hand,  the  language  differences  found  in  the  present 
research  suggest  that  language  structure  will  matter,  that  the  form  in  which 
it  is  convenient  to  mentally  represent  a  design  will  be  a  form  that  is  closelv 
related  to  the  structure  of  the  stimulus.  This  is  consistent  with  an  emphasis 
that  was  popular  in  earlier  problem  solving  research:  stimulus  structures  are 
a  major  Influence  on  the  form  of  mental  representations,  even  for  logically 
Isomorphic  problems  (Hayes  &  Simon,  1977). 

We  also  found  evidence  that  in  later  stages  of  program  comprehension, 
under  appropriate  Cask  conditions,  a  second  representation  Is  available  that 
reflects  the  functional  structure  of  the  program  and  is  expressed  in  the 
language  of  the  real  world  domain  to  which  the  program  is  applied.  Our 
explanations  for  this  later,  task-related  shift  in  comprehension  are 
speculative  and  draw  on  the  concept  of  a  situation  model  representation  of  the 
program  that  is  distinct  from  the  macrostructure  organization  of  the  textbase 
(van  Dljk  <&  Klntsch,  1983).  What  is  clear  from  our  research  is  that  this 
second,  functional  representation  is  not  constructed  quickly  or  automatically 
Programmers  required  extensive  involvement  with  the  program  before  being  able 
to  use  this  structure  to  respond  to  questions  about  the  program.  Further 
research  is  needed  to  explore  the  viability  of  the  situation  model  explanation 
and  the  extent  to  which  changes  in  stimulus  structures  will  alter  the  time 
course  of  its  emergence. 
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Footnotes 

^Contention  over  the  number  of  conceptual  units  concerns  whether 
variations  on  looping  structures  should  be  recognized  as  separate  constructs 
These  controversies  do  not  affect  the  discussion. 

n 

‘It  can  be  less  since  the  programmer  may  know  that  a  single  command  In 
the  programming  language  executes  one  or  more  actions. 

^Data  from  6  additional  subjects  were  discarded  due  to  programmer 
difficulty  with  English  (1),  mechanical  problems  during  the  course  of  the 
experiment  resulting  In  Incomplete  data  (3),  and  motivational  problems  in 
completing  the  experiment  (2). 

“A  response  time  for  a  given  subject  and  item  was  labeled  extreme  If  it 
was  more  than  2.5  standard  deviations  from  the  subject's  mean  response  time 
over  correct  responses,  and  If  it  was  simultaneously  greater  than  2.5  standard 
deviations  from  the  mean  response  time  for  that  particular  Item  computed  over 
subjects  In  the  same  subject  group.  In  addition,  all  response  times  greater 
than  10.0  seconds  were  considered  extreme.  About  1.9%  of  the  response  times 
were  identified  as  extreme  and  their  removal  lowered  the  average  response  time 
by  about  150  milliseconds  and  reduced  variability.  For  example  the  "cleaned" 
average  response  time  for  correct  "yes"  Items  was  2.512  seconds  compared  to  a 
2.670  uncleared  mean.  All  analyses  were  performed  on  cleaned  and  uncleaned 
data  and  In  no  case  was  the  direction  of  differences  between  means  altered  by 
the  removal  of  extreme  response  times. 

■’To  some  extent  these  predictions  are  dependent  on  the  time  course  of 
comprehension  so  that  more  errors  would  be  made  earlier  about  less  accessible 
Information.  Analyses  of  comprehension  questions  by  presentation  position 
were  not  Informative  due  to  the  small  number  of  Items  per  cell  at  this  level 
of  analysis . 

^Lengths  of  the  comprehension  questions  varied  (from  8  to  19  syllables, 
mean--  13.2  syllables)  so  response  times  were  adjusted  for  reading  time  in 
order  to  compare  response  times  between  different  Information  category 
question  sets  as  follows'  Response  times  were  standardized  for  each  subject, 
the  response  time  predicted  bv  a  subject's  syllables/response  time  correlation 
was  subtracted  out;  the  remainder  is  a  standardized  residual  "due  to 
thinking."  No  differences  In  results  occur  using  other  common  methods  of 
adj  us  tment . 
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'This  does  not  preclude  priming  due  Co  some  other  unknown  form  of 
relatedness  thaC  coincidently  was  confounded  with  TS  unit  membership. 
However,  this  "other”  basis  would  also  have  to  account  for  converging 
comprehension  question  results  and  for  results  obtained  in  the  second  study 
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Table  1 

Correspondences  Between  Text  Abstractions, 
Knowledge  Structures,  and  Mental  Repre sen t a L < ons 


TF..XT 

KNOWLEDGE 

MENTAL 

RELATIONS 

STRUCTURES 

REPRESENTATION 

Control  Flow 

Text  Structure 

Procedural  Episodes 

Func  t ion 

Data  Flow 

Plan  Knowledge 

Functional  Representation 

Corn!  Itlon-Actlon 

Unknown 

Unknown 
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Table  2 

Mean  Response  Times  for  Target  Recognition  Test  Items 
as  a  Function  of  Prime  Type 


A.  All  subjects,  response  time  In  seconds 


FORTRAN  COBOL 

SUBJECTS  AND  MATERIALS  SUBJECTS  AND  MATERIALS 


Sublect  Croun 
Within  Laneuace 

FK  Prime 

TS  Prime 

PK  Prim?. 

IS  Ixlgg 

Subjects  Group  1 

(A 

2.691 

Materials) 

(B 

2.695 

Materials ) 

(A 

2.526 

Materials) 

(3 

2.834 

Materials) 

Subjects  Group  2 

(8 

2.268 

Materials ) 

(A 

1.972 

Materials) 

(B 

3.048 

Materials) 

(A 

2  .  594 

Materials) 

All  Subjects 

2.470 

2.333 

2.787 

2  .  714 

B.  Upper  quartile  (Ql)  comprehension  subjects,  response  time  In  seconds 

FORTRAN  COBOL 


Subject  Group 

SUBJECTS  AND 

PK  Prime 

MATERIALS 

TS  Prime 

SUBJECTS  AND 

PK  Prime 

MATERIALS 

TS  Prime 

Ql  Subjects  Group 

1  2.560 

2.463 

2.667 

2.948 

(A  Materials) 

(B  Materials) 

(A  Materials) 

(B  Materials; 

Ql  Subjects  Croup 

2  2.220 

1.807 

2.780 

2.063 

(B  Materials) 

(A  Materials) 

(B  Materials) 

(A  Materials) 

All  Ql  Subjects 

2.391 

2.135 

2.724 

2.505 
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